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PREFACE 


Largely because of the impetus gained during World War II, com- 
munication and control engineering have reached a very high level 
of development today. Many perhaps do not realize that the present 
age is ready for a significant turn in the development toward far greater 
heights than we have ever anticipated. The point of departure may 
well be the recasting and unifying of the theories of control and com- 
munication in the machine and in the animal on a statistical basis. 
The philosophy of this subject is contained in my book entitled 
Cybernetics.* The present monograph represents one phase of the new 
theory pertaining to the methods and techniques in the design of com- 
munication systems; it was first published during the war as a classified 
report to Section Dz, National Defense Research Committee, and is 
now released for general use. In order to supplement the present text 
by less complete but simpler engineering methods two notes by 
Professor Norman Levinson, in which he develops some of the main 
ideas in a simpler mathematical form, have been added as Appendixes 
BandC. This material, which first appeared in the Journal of Mathe- 
matics and Physics, is reprinted by permission. 

In the main, the mathematical developments here presented are new. 
However, they are along the lines suggested by A. Kolmogoroff 
(Interpolation und Extrapolation von stationdéren zufalligen Folgen, 
Bulletin de Vacadémie des sciences de U.R.S.S., Ser. Math. 5, pp. 3-14, 
1941; cf. also P. A. Kosulajeff, Sur les problémes d’interpolation et 
d’extrapolation des suites stationnaires, Comptes rendus de l’académie 
des sciences de U.R.S.S., Vol. 30, pp. 18-17, 1941.) An earlier note of 
Kolmogoroff appears in the Paris Comptes rendus for 1939. 

To the several colleagues who have helped me by their criticism, and 
in particular to President Karl T. Compton, Professor H. M. James, 
Dr. Warren Weaver, Mr. Julian H. Bigelow, and Professor Norman 
Levinson, I wish to express my gratitude. Also, I wish to give credit 
to Mr. Gordon Raisbeck for his meticulous attention to the proof- 


reading of this book. 

Norbert Wiener 
Cambridge, Massachusetts 
March, 1949 


*Published by The M.I.T. Press, Cambridge, Massachusetts 
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INTRODUCTION 


0.1 The Purpose of This Book 


This book represents an attempt to unite the theory and practice of 
two fields of work which are of vital importance in the present emer- 
gency, and which have a complete natural methodological unity, but 
which have up to the present drawn their inspiration from two entirely 
distinct traditions, and which are widely different in their vocabulary 
and the training of their personnel. These ‘two fields are those of time 
series in statistics and of communication engineering. 


0.2 Time Series 


Time series are sequences, discrete or continuous, of quantitative 
data assigned to specific moments in time and studied with respect to 
the statistics of their distribution in time. They may be simple, in which 
case they consist of a single numerically given observation at each 
moment of the discrete or continuous base sequence; or multiple, in 
which case they consist of a number of separate quantities tabulated 
according to a time common to all. The closing price of wheat at Chicago, 
tabulated by days, is a simple time series. The closing prices of all 
grains constitute a multiple time series. 

The fields of statistical practice in which time series arise divide 
themselves roughly into two categories: the statistics of economic, 
sociologic, and short-time biological data, on the one hand; and the 
statistics of astronomical, meteorological, geophysical, and physical 
data, on the other. In the first category our time series are relatively 
short under anything like comparable basic conditions. These short runs 
forbid the drawing of conclusions involving the variable or variables at 
a distant future time to any high degree of precision. The whole emphasis 
is on the drawing of some sort of conclusion with a reasonable expecta- 
tion that it be significant and accurate within a very liberal error. On 
the other hand, since the quantities measured are often subject to 
human control, questions of policy and of the effect of a change of policy 
on the statistical character of the time series assume much importance. 

In the second category of time series, typified by series of meteoro- 
logical data, long runs of accurate data taken under substantially uniform 
external conditions are the rule rather than the exception. Accordingly 
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quite refined methods of using these data for prediction and other 
related purposes are worth considering. On the other hand, owing to the 
length of the runs and the relative intractability of the physical bases 
of the phenomena, policy questions do not appear so generally as in the 
economic case. Of course, as the problem of flood control will show, 
they do appear, and the distinction between the two types of statistical 
work is not perfectly sharp. 


0.3 Communication Engineering 


Let us now turn from the study of time series to that of communica- 
tion engineering. This is the study of messages and their transmission, 
whether these messages be sequences of dots and dashes, as in the Morse 
code or the teletypewriter, or sound-wave patterns, as in the telephone 
or phonograph, or patterns representing visual images, as in telephoto 
service and television. In all communication engineering—if we do not 
count such rude expedients as the pigeon post as communication engi- 
neering—the message to be transmitted is represented as some sort 
of array of measurable quantities distributed in time. In other words, 
by coding or the use of the voice or scanning, the message to be trans- 
mitted is developed into a time series. This time series is then subjected 
to transmission by en apparatus which carries it through a succession 
of stages, at each of which the time series appears by transformation 
as a new time series. These operations, although carried out by elec- 
trical or mechanical or other such means, are in no way essentially 
different from the operations computationally carried out by the time- 
series statistician with slide rule and computing machine. 

The proper field of communication engineering is far wider than that 
generally assigned to it. Communication engineering concerns itself 
with the transmission of messages. For the existence of a message, it is 
indeed essential that variable information be transmitted. The trans- 
mission of a single fixed item of information is of no communicative 
value. We must have a repertory of possible messages, and over this 
repertory a measure determining the probability of these messages. 

A message need not be the result of a conscious human effort for the 
transmission of ideas. For example, the records of current and voltage 
kept on the instruments of an automatic substation are as truly messages 
as a telephone conversation. From this point of view, the record of the 
thickness of a roll of paper kept by a condenser working an automatic 
stop on @ Fourdrinier machine is also a message, and the servo-mecha- 
nism stopping the machine at a flaw belongs to the field of communica- 
tion engineering, as indeed do all servo-mechanisms. This fundamental 
unity of all fields of communication engineering has been obscured by 


COMMUNICATION ENGINEERING 3 


the traditional division of engineering into what the Germans call 
Starkstromtechnik and Schwachstromtechnik—the engineering of strong 
currents and the engineering of weak currents. There has been a tendency 
to identify this split with that between power and communications 
engineering. As a result of the consequent division of personnel, the 
coramunications problems of the power engineer are often handled by 
a technique different from that which the ordinary communications 
engineer employs, and useful notions such as that of impedance or 
voltage ratio as a function of the frequency are often much neglected. 

“This is still further accentuated by the wide difference in the frequency 
range of interest to the telephone engineer and to the servo-mechanism 
engineer. Ordinary passive electric circuits have time constants of a 
small fraction of a second. For time ‘constants of seconds or minutes, 
passive circuits require impedances of orders of magnitude not at all to 
be realized by the conventional technique of inductances and capacities. 
This difference in technique has often blinded the communications and 
the power engineers to the essential unity of their problems. 

It is, of course, true that the main function of power engineering is 
the transmission of energy or power from one place to another together 
with its generation by appropriate generators and its employment by 
appropriate motors or lamps or other such apparatus. So long as this 
is not associated with the transmission of a particular pattern, as for 
exarople in processes of automatic control, power engineering remains 
a separate entity with its own technique. On the-other hand, in that 
moment in which circuits of large power are used to transmit a pattern 
or to control the time behavior of a machine, power engineering differs 
from communication engineering only in the energy levels involved and 
in the particular apparatus used suitable for such energy levels, but is 
not in fact a separate branch of engineering from communications. 


0.4 Techniques of Time Series and Communication Engineering 
Contrasted 


Let us now see what are the fields from which the present-day. stat- 
istician and the present-day communication engineer draw their tech- 
niques. First, let us consider the statistician. Behind all statistical work 
lies the theory of probabilities. The events which actually happen in a 
single instance are always referred to a collection of events which might 
have happened; and to different subcollections of such events, weights 
or probabilities are assigned, varying from zero or complete improb- 
ability (rather than certainty of not occurring), to unity or complete 
probability (rather than certainty of occurring). The strictly mathe- 
matical theory corresponding to this theory of probability is the theory 


4 INTRODUCTION 


of measure, particularly in the form given by Lebesgue. A statistical 
method, as for example 2 method of extrapolating a time series into the 
future, is judged by the probability with which it will yield an answer 
correct within certain bounds, or by the mean (taken with respect to 
probability) of some positive function or norm of the error contained in 
its answer. 


0.41 The Ensemble 


In other words, the statistical theory of time series does not consider 
the individual time series by itself, but a distribution or ensemble of 
time series. Thus the mathematical operations to which a time series is 
subjected are judged, not by their effect in a particular case, but by 
their average effect. While one does not ordinarily think of communica- 
tion engineering in the same terms, this statistical point of view is 
equally valid there. No apparatus for conveying information is useful 
unless it is designed to operate, not on a particular message, but on 2 
set of messages, and its effectiveness is to be judged by the way in which 
it performs on the average on messages of this set. “On the average” 
means that we have a way of estimating which messages are frequent and 
which rare or, in other words, that we have a measure or probability of 
possible messages. The apparatus to be used for a particular purpose is 
that which gives the best result ‘on the average” in an appropriate 
sense of the word “average.” 


0.42 Correlation 


Another familiar tool of the statistician is the theory of correlation. 
If 21, «++, Z, are the numbers of one set and y;,+++, yn the numbers of 
another set, the coefficient of correlation between the two is 


(0.421) 


This quantity must lie between —1 and 1. Mathematically it is to be 
interpreted as the cosine of the angle between the two vectors (x1,+ ++, Zn) 
and (y:, ++ -, Yn). When this quantity is nearly +1, there is a strong 
degrer of direct or reverse linear dependence between the z,’s and y;’s. 
On the other hand, if this quantity is nearly 0, a low degree of linear 
dependence between the z;’s and y;’s is indicated. This correlation 
coefficient has been normalized by the denominator which has been 
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adopted. For many purposes it is useful to consider the numer- 
n 

ator © x.y; as the correlation. 
1 


In time series the correlations which chiefly come into play are those 
between a certain sequence of values and the same or another sequence 
of values under a shift in time, time being represented by the index of 
the data to be correlated. If this is a sequence « - -, 1, 2, °° +, Un, *°* We 
take the correlation 


1 
_=— j * 
= = ON wot, Te45Tk (0.4215) 
to be the so-called auto-correlation coefficient of the sequence +--+, 
21, 22, °**, Zn, °**, and similarly we take 
1 N 


jim oye oN + 1, zi Tet jy (0.422) 


to be the cross-correlation coefficient of the xz sequence with the sequence 
+++) Yt Yo, °°) Yny ***. These auto-correlation coefficients and cross- 
correlation coefficients are of course functions of the lag 7. It will be seen 
that the auto- and cross-correlation coefficients are independent respec- 
tively of a shift in the time origin for the sequence of z’s alone or the 
pairs of z and y sequences. Where we are dealing with continuous data 
instead of discrete data the sequence ---, 21, To, +++, Zn, +++ has as an 
analogue the function f(t), where the variable ¢ corresponds to the 
subscript of the z. As an analogue of the auto-correlation in the discrete 
case we obtain the quantity 


. 1 4 
ote) = Tim oof ft+ FO ay (0.424) 


which we also call the auto-correlation. Similarly the quantity 


lim 55 at. jt+rg@at (0.425) 


is termed the cross-correlation of f and g. We shall see that these quan- 
tities bear an intimate relation to the theory of spectra or periodograms 
and that they are quite as significant for the electrical engineer as they 
are for the statistician. 


*In this book sections and equations are numbered by a flexible system analo- 
gous to the Dewey Decimal System, which permits the inclusion of other sections 
and equations as they are needed. 
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0.43 The Periodogram 


The process of obtaining the auto-correlation coefficient of a given 
sequence or function preserves certain information concerning the se- 
quence or function and discards certain other information. The informa- 
tion preserved is such as to throw away all relations of phase of the 
frequency components and to preserve all information concerning their 
frequencies and amplitudes. For example when 


S® = Dane (0.431) 
we shalJl have 


1 7 = 
lim oF i St+r)fOadt= Llane, (0.432) 


Thus the information given by the auto-correlation coefficient is of the 
same nature as the information given by the so-called periodogram of 
a sequence or function, in which we give the square of the amplitude of 
every trigonometrical component as the function of the frequencies by 
disregarding phase relations. However, the periodogram of a function 
in the narrowest sense, originally given to it by Sir Arthur Schuster, is 
concerned only with that part of the function which is actually the sum 
of a discrete set of trigonometric terms. The present author} has de- 
veloped a form of periodogram theory in which not only the lines in the 
spectrum of a function of a sequence play a role but also that residue 
which is left after the removal of all sharply defined lines. More of this 
will be found in our first chapter. The periodogram theory as thus ex- 
tended will be found to cover exactly the same ground as the theory of 
the auto-correlation coefficient. Similarly, an extension of this periodo- 
gram theory for pairs or other sets of several functions will be found to 
lie in close relation with the theory of cross-correlation. 

As will be indicated by our discussion in Chapter I, one of the opera- 
tions which is often performed on time series is the search by these 
methods or others for certain hidden periodicities. Except in a few 
cases when certain enthusiasts have taken these periodicities as ultimate 
realities, the purpose of the search for hidden periodicities is in some way 
to aid the extrapolation of time series either into the past or into the 
future. While this is the main purpose behind the periodogram analysis 
which plays so large a part in statistical literature, it is doubtful that 
we can find anywhere a thoroughly satisfactory statement of how such 
periodograms can be most effectively used in prediction. What we shall 
do in the present volume is to make a direct attack on the prediction 


{ Generalized Harmonic Analysis, Acta Mathematica, Vol. 55, pp. 117-258, 1930. 
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problem in which many of the same notions as those found in periodo- 
gram analysis will play a part, but we shall approach the question from 
the standpoint of determining the optimum method of prediction rather 
than from that of giving an independent weight to the problem of search 
for hidden periodicities. 

The history of periodogram analysis is rather interesting in the light 
that it throws on the question of the bond between statistical theory and 
communication engineering. The method of periodogram analysis is due 
to the late Sir Arthur Schuster and had its origin in his researches in 
optics and in particular on the problem of coherency and incoherency 
in light. This statistical theory goes back to a physical origin closely 
related to communication engineering. 


0.44 Operational Calculus 


We now come to notions which belong to the repertory of the com- 
munications engineer as distinguished from the statistician. One of the 
first of these pieces of technique is the method of studying an electric 
circuit by ascertaining its response to an instantaneous switching on of a 
voltage or application of a current. This method, much pursued by the 
late J. B. Carson of the American Telephone and Telegraph Company, 
is perhaps the only method which is equally familiar to communications 
and to power engineers. As such it has played a large role in existing 
servo-mechanism technique. The functions obtained for these switching 
transients satisfy elementary systems of differential equations with 
constant coefficients and elementary initial or terminal conditions, 

Equations of this sort have interested mathematicians from early 
times, because of the simple way in which many operators with constant 
coefficients are compounded, this composition amounting to a multi- 
plication in elementary algebra, if we treat the differential operator 
d/dt as a quasi-number. While methods of this sort are at least as old 
as Laplace, the credit for developing them into a practical technique for 
circuit computation is unquestionably due to the late Oliver Heaviside. 
From about 19Q0 until 1930, Heaviside’s methods dominated the whole 
of communications engineering technique, and their rigorous mathe- 
matical justification was a moot question in engineering circles. Towards 
the end of this period, however, several avenues of approach for the 
rigorous mathematical justification of the formal Heaviside calculus 
were found, and with this it came to be appreciated that Heaviside’s 
work belonged directly together with the theory of the Laplace and 
Fourier integrals. The construction of a comprehensive table of Fourier 
transforms by Drs. Campbell and Foster of the Bell Telephone Labora- 
tories almost at once replaced the Heaviside calculus by the classical 
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Fourier integral theory as the method of choice in communications 
engineering. ‘This place the latter still holds. 


0.45 The Fourier Integral; Need of the Complex Plane 


If f(t) is a function defined over the infinite interval then under certain 
circumstances the expression 


1 a iw 
gs) = ac "foe at (0.451) 


exists, where w also lies anywhere on the infinite line. Under certain 
somewhat more restricted circumstances not only does g(w) exist but 


1 Zs to 
1 == f * glue! de. (0.452) 


The functions f(t) and g(w) are said to be Fourier transforms, each of 
the other, and the representation 


f(t) = - | é eft dey , * fs)e™* ds (0.453) 


is said to be the Fourier integral representation of f(¢). There is a satis- 
factory Fourier integral theory which does not move from the axis of 
real values of w. On the other hand, the study of g(w) in a complex half- 
plane or even in the entire complex plane has proved to be very fruitful 
in electrical circuits. This was clearly seen by Bromwich, Doetsch, 
Wiener and others.t The use of the complex plane is intimately allied 
to the fact that physically applicable operators of engineering allow us 
to work with the past of our data, but not with their future. A somewhat 
different way of stating what amounts to the same thing is to assert that, 
in the discussion of vibrating systems, the position of singularities of 
certain associated functions expressed in the complex plane will deter- 
mine whether these systems have oscillations which die out to zero or 
which grow to infinity. These considerations are familiar to everyone 
who has worked in filter theory, but they are far from familiar to the 
average statistician and have, until recently, played no part in the 
classical development of the theory of time series. 


0.5 Time Series and Communication Engineering—The Synthesis 
We are thus confronted with two groups of techniques, each of which 
is intrinsically relevant both to time-series work and to communications 


¢ For these and other articles on this topic see Harold Jeffreys, Operational Methods 
in Mathematical Physics, 2nd Ed., Cambridge University Press, 1931, Bibl. pp. 
114-117. 
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engineering. Methods involving probability theory and correlation are 
part of the traditional stock in trade of the statistician, but, on the other 
hand, the use of the complex plane is quite foreign to his training. The 
complex plane of function theory has 2 Jong history in communication 
technique, but statistical methods do not, and, as things stand at present, 
a man may be a practiced communications engineer without even being 
aware of their existence. Fourier methods belong partly to the repertory 
of each, for they occur in the theory of the periodogram and in the opera- 
tional calculus, but they have been applied neither by the statistician 
nor by the communications engineer with a@ full awareness of their 
power. It is the purpose of this book to introduce methods leading from 
both these existing techniques and fusing them into a common technique 
which, in the opinion of the author, is more effective than either existing 
technique alone. 


0.61 Prediction 

Let us now consider some of the things which we can do with time 
series or messages. The simplest operation which we can perform is that 
of extrapolating them or, in other words, of prediction. This prediction, 
of course, does not in general give a precise continuation of a time series 
or message, for, if there is new information to come, this completely 
precludes an exact estimate of the future. In accordance with the sta- 
tistical nature of time series, they are subject to a statistical prediction. 
This means that we estimate the continuation of a series, which, within 
certain limitations, is most probable, or at any rate the continuation of 
a series which minimizes some statistically determinable quantity known 
ag the error. This consideration also applies to whatever other operations 
we can perform on time series. 


0.62 Filtering 

The next important operation on time series is that of purification or 
filtering. Very often the quantity which we really wish to observe is 
observable only-after it has been in some way corrupted or altered by 
mixture, additively or not, with other time series. It is of importance to 
ascertain as nearly as possible in an appropriate sense, that is, in a 
statistical sense, what our data would have been like without the con- 
tamination of the other time series. This may be the complete problem 
before us; or it may be combined with a prediction problem, which 
means that we should like to know what the uncontaminated time series 
will do in the future; or we may allow a certain amount of leeway in 
time and ask what the uncontaminated time series has done at a certain 
past epoch. This problem comes up in wave filtering. We have a message 
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which is a time series, and a noise which is also a time series. If we seek 
that which we know concerning the message, which is not bound to a 
specific origin in time, we shall see that such information will generally 
be of a statistical nature; and this will likewise be true of our informa- 
tion of the same sort concerning the noise alone, or the noise and the 
message jointly. While this statistical information will in fact never be 
complete, as our information does not run indefinitely far back into the 
past, it is a legitimate simplification of the facts to assume that the 
available information runs back much further into the past than we are 
called upon to predict the future, 

The usual electrical wave filter attempts to reproduce a message “‘in 
its purity,’’ when the input is the sum of a message and a noise. In case 
the measure of the purity of a message is the mean power of its perturba- 
tion, and the apparatus allowed to us for filtering is of linear character, 
the desired statistical information concerning the noise and the message 
alone will be that furnished by their spectrum or periodogram. The 
extra information required concerning the two together is exactly that 
which may be derived from their cross correlation. 

While the pure filtering problem is clearly distinguishable from the 
prediction problem, mixed problems involving elements of both are of 
great importance. The filter problem as we have described it is that 
in which a message is to be imitated without a time delay. In practical 
circuit problems, a uniform delay is not undesirable if it is not excessive, 
and the theory must be adapted to this fact. Indeed, good filter perform- 
ance depends on the introduction of a delay. If the delay is negative, 
the performance suffers, but, on the other hand, the filter becomes a 
filtering predictor, which is often a useful instrument. 

The problem of filtering may occur on a very different time level in 
mechanical circuits:—for example, if data are put into a circuit by the 
operation of a manual crank, it will frequently occur that cranking 
errors are in some way superimposed on the true data and must be 
eliminated to prevent the harm they may do to the future interpretability 
of these data. Even in statistical time series given purely numerically, 
it is not always possible to eliminate errors in the collection of the data 
values, and it is often necessary to resort to 2 mathematical method 
intended to minimize the effect of these errors after the data have been 
collected. 


0.53 Policy Problems 

The time-series problems of prediction and filtering have one aspect 
in common which we may cover by saying that they are extrinsical. 
They operate on data which have been collected completely, and do 
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not give us direct information as to how these data might have been 
altered if we had made some humanly possible variation in the system 
in which these data occurred. Opposed to this use of time series, or at 
least apparently opposed, is what we may call the intrinsic study of time 
series. For example, we may wish to ascertain how certain series dealing 
with the economics of a country might have been changed if a different 
system of taxation had been adopted, or we may wish to know the 
expected effect of a new dam in a river system on the statistics of floods. 
Questions of this sort are often called questions of policy. Let it be noted 
that in the long run questions of policy, although they depend on what 
we may call the dynamics rather than the kinematics of the systems to 
which they pertain, deal with a dynamics which, after all, itself can only 
be determined statistically. In this sense, what is an intrinsic problem 
for a single series or a small collection of series may prove to be an 
extrinsic problem if we are dealing with a sufficiently complicated 
multiple time series. Thus the distinction between extrinsic and intrinsic 
problems, while important, may not be considered absolute. 


0.6 Permissible Operators: Translation Group in Time 


The theory of the proper treatment of time series involves the selection 
of certain operators on time series from among all possible operators, or 
at least from a subsidiary class of possible operators. One requirement on 
such operators is immediate. If we are dealing with any field of science 
where long-time observations are possible and where experiments can be 
repeated, it is desirable that our operations be not tied to any specific 
origin in time. If a certain experiment started at ten o’clock today would 
give a certain distribution of results by twelve o’clock, then we must 
expect that if this experiment is carried out under similar conditions at 
ten o’clock tomorrow, by twelve o’clock we should get the same distribu- 
tion of results. Without at least an approximate repeatability of experi- 
ments, no comparisons of results at different times are possible, and there 
can be no science. That is, the operators which come into consideration are 
invariant under a shift in the origin of time. These shifts can be combined 
in such a way that the result of two consecutive shifts is a shift, while 
the inverse of a shift is a shift. In mathematical language, they constitute 
a group, known as the translation group in time. In other words, the 
allowable operators must be invariant with respect to the translation 
group in time. We shal! assume that, even where this is not strictly the 
case, as may occur in economic data owing to the extreme variability 
of their background, some attempt has been made to eliminate the 
trends and drifts before the methods of this book are applied. 
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0.64 Past and Future 


Not every operator invariant under the translation group is allowed. 
We have said already that, while the past of a time series is accessible 
for examination, its future is not. That means that our operators must 
have an inherent certain one-sidedness. In the case of linear operators, 
an adequate theory of these one-sided operators is in existence. In the 
general non-linear case, much remains to be done in this direction. 


0.62 Subclasses of Operators 


We have just spoken of linear operators. In this connection, the opera- 
tors which are allowed to us for a given task of prediction or filtering or 
policy prescription are not always the complete class of all conceivable 
operators. Hither in order to facilitate computation or on account of the 
present imperfection of engineering techniques of realization we may 
have to restrict ourselves to a much narrower class. One such restricted 
class is that of all linear operators; or, better stated, all linear operators 
depending only on the past and invariant under translation. This is a 
class for which the theory and computation are particularly easy, and 
it is also a class for which adequate electrical and mechanical realiza- 
tions are at hand. However, there are times when we may not be at 
liberty to restrict ourselves to this task, as well as times in which, for 
reasons of technical convenience, we may not have this full class at our 
disposal. 

Even the class of linear operators invariant under a translation in 
time and not referring to the future of a function is a class containing 
widely diverse members. For example, the operators on f(é) yielding 


fO; 3ft-5); ft-2); HOTYO; 
‘% fi — sje" ds 
are all examples of such operators. On the other hand, the operators 
yielding 
t) 
yor; FO; "f° rovmxe-oasat 
are not linear; that yielding 


fi se-9era 


depends in fact on the future of f(t); and that yielding 
AG) 
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is not invariant under the translation group, since 
(é+ 5)f(t + 5) ¥ oft + 5). 


0.7 Norms and Minimization 


The task of determining the best operator for a particular purpose 
depends on the definition of “best.” One very general type of such 
definitions is that in which we assign a certain quantity as the error of 
performance of our operator, and then seek to choose our operator from 
the admissible class in such a way that the error it produces is as small 
as possible. Thus the best operator for prediction will be that which 
minimizes the numerical measure of the difference between the actual 
future of a time series and its predicted future. This numerical measure 
should itself not be tied to an origin in time and should be a single quan- 
tity even though the difference which it measures is that between two 
functions. 

The numerical measure of an error we shall call its norm. The desired 
conditions for a norm are the following: 


First, the norm must always be a positive quantity, or zero. 
Second, the norm must be positive whenever an error exists and zero 
whenever an error does not exist. 


The simplest types of norm to fulfill these conditions are what we may 
call the quadratic norms. Such a norm depends upon the error in such a 
way that when we multiply the error by -k A we multiply its norm by A”. 

The quadratic norm of a voltage which is a function of the time is, 
on a proper scale, a quantity of the nature of power. The quadratic 
norm of the error of a message determines the power of the correcting 
message. It is this that gives quadratic norms their physical importance. 

These norms are not the only ones which are possible. For example, 
we may have third- or fourth-power norms. 

We thus have specified the problem of prediction or filtering or policy- 
prescription control as the selection of an operator from among a certain 
admissible class, in such a way that some “error quantity” (which 
estimates the degree by which our statement of the problem is not per- 
fectly solved) shall be minimized. If a minimization problem is carried 
out in a system of a finite number of degrees of freedom, it belongs to 
the ordinary differential calculus. If the system of operations over which 
we have to minimize is more extensive than such a system (of a finite 
number of degrees of freedom), the problem of minimization belongs to 
the calculus of variations of Euler and Lagrange or, at any rate, to one 
of its extensions. Let it be noted that the more general minimization 
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problems which may arise in the work under discussion go well beyond 
the calculus of variations as it is commonly defined at present, since 
the quantity our problem requires us to minimize may be determined, 
not by a function which is in the ordinary sense defined in a space of a 
finite number of degrees of freedom, but by a functional operation involv- 
ing the entire past of the functions with which we are dealing and not 
reducible to one which is restricted to a space of a finite number of degrees 
of freedom. 


0.71 The Calculus of Variations 


Every minimization problem demands a set of variable entities 
(numbers, vectors, functions) over which the desired minimization is 
to be performed. The set of functions afforded for minimization purposes 
should have at least this one property, that the sum of any two functions 
or operators should also belong to the set. Let us then symbolically 
represent such an operator by the symbol Op and another operator 
which we shall call its variation by the symbol dOp. Let us consider the 
restricted class of operators given by Op + «dOp. In this the quantity 
eis a real quantity which may be varied at will. Let the norm of the 
error resulting from the use of Op + «Op be n(Op + «dOp). If we are 
to select Op in such a way that n(Op + éOp) is to be a minimum when 
6Op = 0, we must a fortiort minimize n(Op + Op) at « = O for every 
choice of 8Op. Note that it does not immediately follow that to minimize 
n(Op + Op) for every choice of e will result in a complete solution of 
the minimization problem with a free choice of the operator throughout 
our entire field of operators, but this limited minimization is “at least” 
necessary. This minimization, depending on e, is of the same nature as 
the familiar minimizations of functions of a single variable which we 
find in the classical differential calculus. In the ordinary calculus, if we 
are minimizing a quantity having everywhere a derivative which is 
continuous, and if we can in one way or another eliminate from consid- 
eration the boundary values of our parameter, then the minimum of our 
dependent quantity is attained at a point where its derivative with re- 
spect to the independent quantity vanishes. Such a point is called a 
stationary point. It is, of course, not true that every stationary point 
is even a local minimum or maximum in the sense that it is a minimum 
or maximum for near-by values. It is even less true that a local minimum 
or maximum need be an absolute minimum or maximum. If, however, 
we have other evidence for the existence of one and only qne minimum 
in the interior of the region considered, and if we have found a stationary 
point in this region, then the stationary point must be the minimum in 
question. Therefore, the first stage in the investigation of minima is the 
search for stationary points. 
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In the present case, this leads to the system of equations 
5 n(Op +.0p)=0 when «=0 (0.711) 
€ 


for all admissible 5Op. 

This system of equations may often be reduced to a single equation 
in which 6Op does not appear explicitly. However, as we have stated 
above, such an equation must be supplemented by a further investiga- 
tion to determine whether it yields a solution which is a true absolute 
minimum. Furthermore, there will be cases in which the equation has 
no solution satisfying the conditions which we have originally set up for 
the class of functions in terms of which the original minimization was 
to have been performed. In other words, we may have accidentally 
selected a class of operators which permits us to make the norm of our 
error as near its minimum value as we wish, but will not permit it 
actually to assume this value. It may or may not be possible to remove 
this difficulty by assuming @ more comprehensive class of admissible 
operators to begin with. Even if this cannot be done, this does not mean 
that the whole problem of minimization has become trifling. In the 
absence of a true optimum method of determining a prediction or filter- 
ing operator, it is often of practical value to determine an operator which 
will produce as the norm of the error a value arbitrarily near its mini- 
mum. In many cases this can be done by tying up the minimiza- 
tion problem with a related problem having slightly differing condi- 
tions, in which a true minimum exists. We shall have occasion to do 
exactly this. 


0.8 Ergodic Theory 


Before taking up concrete prediction or filtering problems, it is worth 
while to discuss the general statistical nature of time series and certain 
statistical parameters relative to such series. If a time series is not tied 
down to a specific origin in time and is conceived to run from minus 
infinity to infinity in time, then any statistical distribution of such series 
will not be affected by a shift of the origin of time. This means that such 
a shift comprises a transformation of one time series into another time 
series which in general alters the individual series and changes any set 
of time series for which a well-defined probability exists into another set 
of different time series, but with the same probability. Without here 
entering upon the analytic refinements essential to a definition of 
Lebesgue measure, it may be said that Lebesgue measure is a technical 
term for a concept agreeing in all essential characters with probability, 
and that we have a situation in which every shift in our time scale by a 
real additive constant (or an integer additive constant, as the case may 
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be) generates a probability-preserving (or measure-preserving) trans- 
formation of a set of contingencies into itself. 

We are thus confronted with the identical situation contemplated by 
the so-called ergodic theorem of Professor G. D. Birkhoff. The term 
“ergodic” has an interesting history in the literature of statistical 
mechanics. The problem with which it is associated is that of the relation 
between phase averages and time averages. It was essential for the 
statistical mechanics of Willard Gibbs that some means be found to 
identify averages concerning the consecutive positions of a dynamical 
system of a certain sort with averages concerning all systems at a certain 
energy level. The first method suggested for making this identification 
was to assume that such a system in time would assume all positions 
compatible with the energy level in question. This assumption, how- 
ever, is completely inadequate to the conclusions drawn from it; and, 
even worse, if can never be satisfied save in trivial cases. The suggested 
weakened hypothesis, that the system in course of time passes arbi- 
trarily near every possible position, is not contradictory but is inadequate 
to the derivation of any significant conclusion. The correct theorem of 
Birkhoff, in the language of probability, reads as follows: Let = bea set 
of contingencies of finite probability. Let T be a transformation turning 
a contingency P into a contingency TP, and Ict it leave the set of con- 
tingencies = unaltered. If 2, is a set of contingencies contained in 2, 
for which a probability exists, and T'2, is the set of its transforms by T, 
let the probability of 2, be the same as that of T2,. Let | F(P) | have 
an average over 2. Then, except for a set of contingencies P of zero 
probability, the limit 

. L " 
DA ea nee) 
will have a definite value. 

In this case we have been considering the transformation T and its 
powers. It is possible to introduce the notion of the fractional powers of 
a transformation J. Let T* be a measure preserving transformation no 
matter whether the value of \ is positive, negative, or zero, so long as 
it is real. Let T'(T#P) always be equal to T**#P. Let us make an 
auxiliary assumption, of interest only to the pure mathematician, to the 


effect that P'Pismeasurablein both dand P.Let f | F(P) |dVp <=, 


where dV p signifies integration with respect to volume for the point P. 
Then Birkhoff’s theorem is to the effect that the limit 


iin { ” F(T) an (0.8005) 


TO. T 
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will almost always exist in the sense that the probability that a point P 
should lie in a set for which this limit does not exist is 0. 

To go back to the discrete case, let us consider the time series - --, 
Gin,***,; @,°**, Ga, ***. In this time series let the particular numbers 
themselves depend upon some quantity a which we shall call the param- 
eter of distribution and which we shall take to lie between 0 and 1. In 
other words, let us not consider the individual time series {a,}, but 
rather a statistical distribution of time series {a,(a)}. Let any set of 
time series for which this parameter of distribution a lies on a set Sy 
(of values with probability p) go into another set for which the param- 
eter a lies on a set Sq (of values with the same probability p), when 
each a, goes into da41. In a language often used in statistical theory we 
shall say that the time series is stationary. Furthermore, let the average 
of | ao” | with respect to the parameter a be finite. Then Birkhoff’s 
theorem for the discrete case tells us that, except for a set of values of 
« of zero probability, the so-called auto-correlation 


: 1 vl 
yore DNF gay a 
will exist (for all k) and be finite. 

If we have two distinct time series, that of the a,’s and a similar one 
of the b,’s dependent on the same parameter of distribution a, and if 
the means with respect to « of | ag |? and | bo? | both are finite, and if the 
same transformation of a generates the change of a, into a,,; and the 
change of b, into b.41, then, except for a set of values for @ of zero 
probability, the cross-correlation between the a’s and the 6’s 


‘ 1 
Em spa SN aod ne Onp40n (0.803) 
will also exist (for all k) and be finite. 

The notion of stationary time series will apply as well to contin- 
uous series of data as to discrete series of data. A continuous time 
series dependent on a parameter « may be written in the form f(t, a). 
Let us consider the case in which 


ft+1,«) = ft, T’a), 
where the transformation T’ preserves probability in a. Let 


s "by: 0 Paaee. (0.804) 


It then can be shown that, except for a set of values of a of probability 0, 
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at on — 
lim a itt 1, afta) ae (0.806) 


will exist and have a definite finite value. Similarly, an analogous result 
to (0.803) will hold when we consider two continuous stationary time 
series dependent on the same parameter of distribution. 

At this point there is a new matter which deserves a little discussion. 
It is a well-known fact of measure theory that a denumerable set of con- 
tingencies of zero probability, that is, a set of contingencies which can 
be exhausted by an arrangement in 1, 2, 3,---, order, adds up to a set 
of contingencies which is itself of zero probability. From this it follows 
that, if we prove the limit (0.801) to exist except for a set of cases of 
zero probability for each k independently, then it exists except for a set 
of cases of zero probability for all k’s simultaneously. If, however, we 
refer to formula (0.806), it does not follow that, if that limit exists for 
almost all values of a for ech r independently, it exists for almost all 
values of a simultaneously for all values of 7. Nevertheless it can be 
proved mathematically that the limit (0.806) does, in fact, exist for all + 
at the same time, for almost all a. 

Let us remark that stationary time series have properties other than 
those of auto-correlation coefficients which may be proved to exist for 
almost all values of a by means of the ergodic theory. For example let 


ft + v1, aft + 12, a) ---f(E+ rn, a) (0.808) 


be absolutely integrable over the range (0,1) of a. The average 


T 
fia, ) ftbrye)*<fE+ rea) dt (0.8085) 
T0 27 Jr 

will exist for almost all values of a. It may be shown that quantities such 
as (0.808) constitute in a certain sense a complete set of parameters of a 
stationary time series, but in this book we shall not deal with this 
remark in‘any detail. 

A particular class of measure-preserving transformation of a into 
itself is that in which no set of values of « (of probability other than 1 or 
zero) is transformed into itself under the transformation T or the set of 
transformations 7", respectively. Similarly, in the continuous case we 
may have a situation where no set of values of « of measure other than 
1 or zero is transformed into itself by all the transformations T*. In 
these cases the transformation T is said to be ergodic or metrically transi- 
tive. If T is an ergodic transformation, and F is a function whose absolute 
value is integrable, then, except for a set of values of P of probability 
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zero, we may show that 


lim ae r("P) =f F(P) ave. (0.809) 


Ne 


Similarly, if 7” represents.an ergodic group of measure-preserving trans- 
formations such that 


T(T#P) = (Tsp) (0.8091) 
and 
1, F(P) dVp = A, (0.8092) 
then for almost all elements P we have 
lim = a -F(T*’P) dd = (0.8093) 
Tw 


It may thus be shown that in the ergodic case we have 


Poe (ay he . es mf » 
lim = f F(I*P) a = lia 7 f FCP) an 


Tao 


* 1 ? > 
= lim 55 Jf FUPP) a, (0.8094) 
for almost all elements. P. It may. also be shown that in the non-ergodic 
case the measure-preserving transformation may in a certain sense be 
resolved into ergodic components. From this it is not hard to conclude 
that formula (0.8094) is also valid in the non-ergodic case for almost 
all elements P. Thus in the case of auto-correlation coefficients we almost 
seg have 


aa f f+ 1 afGa) at 


mora 


leaser of (0,—4) 
= lim 5 =f. flt-++1,a)fl,a) dt, (0.8095) 


Jan 


in the sense that the probability that « be a value for which this is not 
true is zero. This enables us to make the inductive inference from an 
expression 


lesser of (0,—1) 
a =f. f(t + 7, afl, a) dt, (0.8098) 


inate 


determined only by the past of the function f(t, a), to an auto-correlation 


lim : f(t+ 7, a)f(t, a) dt, (0.8097) 
To 7 
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which strictly involves an investigation of both the past and the future. 
We shall find this conclusion of vital importance in the process of pre- 
diction, as well as the exactly similar conclusions for pairs of functions, 
f and g, and for the discrete case. 


0.81 Brownian Motion 


Let us now look for examples of stationary time series as they are found 
in nature. For such stationary time series to be really interesting from 
the standpoint of communication engineering, it is essential that their 
future be only incompletely determined by their past. This imme- 
diately excludes such time series as are generated by the solutions of the 
wave equations or similar partial differential equations in systems where 
some sort of analyticity is presupposed for the instantaneous data. For a 
long time all physical time series were supposed to be of this sort. How- 
ever, with the progress of the microscope, it became obvious that small 
particles sustained in a liquid or gas were subject to a random motion 
whose future was largely unpredictable from its past.§ If a particle is 
kicked about by the molecules of gas or liquid, it describes a path which 
may be characterized approximately in the following way: The X, Y, 
and Z components of the motion of the particle are completely inde- 
pendent each of the other. Taking any single component, the amount of 
change which it is likely to make in a given time has a distribution com- 
pletely independent of the amount of change which it makes in an 
interval of time not overlapping this. The distribution of this amount of 
change in a given time is what is known as Gaussian; that is, the prob- 
ability that the displacement of the particle in time ¢ should lie between 
x and x + dz will be of the form 

1 —xzZpe 
Wee € dz, (0.811) 


where the quantity p depends upon the medium in which the particle 
is suspended as well as the mass and size of the particle and the tempera- 
ture. The theory of this ensemble of functions has been given by the 
present author,|| and it appears that, although the ensemble is not 
strictly a time series in that the origin of reference changes with the time, 
there are related to it time series having the ergodic property. If in a 
Brownian motion we consider not the position ata given time but instead 
sets of positions at more than two times, and then take only their differ- 
ences, then the distribution of Brownian motions over an appreciable 


§ J. Perrin, Les atomes, 4th Ed., Paris, 1931. 
|] Generalized Harmonic Analysis, Acta Mathematica, Vol. 55, 1930. 
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interval is unchanged by a shift in the origin of time. Therefore, if we 
take any property of a Brownian motion independent of the absolute 
position of this motion, we shall generate a time series having the 
ergodic property. 

A particular example of this results from the response of a linear 
resonator to 2 Brownian input. We shall present the harmonic analysis 
of the response of a resonator to the Brownian motion and thus shall 
generate a definite form of statistical ensemble, subject to a specific 
theory of anticipation or filtering. 


0.9 Summary of Chapters 


There are other theories of random distribution besides the Brownian 
theory, which likewise are subject to definite techniques of analysis. 
In the discrete case the ensemble of sequences {a,} (where each a, is 
uniformly distributed over some interval, in complete independence of 
all other @,,’s, and the transformation T' changes a sequence by adding 
1 to the index n of each term) is an example in point. Again, in the con- 
tinuous case, the Poisson distribution (which may be described as the 
distribution of bullet holes on a remote target, when taken in the one- 
dimensional case on an infinite line) yields a distribution of patterns 
which is invariant under time translation and serves as the basis of a 
prediction or filtering theory. These instances will be discussed in 
Chapter I, together with a large number of other mathematical notions 
which are indispensable for a comprehension of what is to follow. These 
cover in particular much material from the author’s book, The Fourier 
Integral (Cambridge, 1933); his joint Colloquium tract with the late 
R.E.A.C, Paley, Fourter Transforms in the Complex Domain (New York, 
1934) ; and his memoir, Generalized Harmonic Analysis, in Acta Mathe- 
matica (1930). In the present book, proofs are in general not given, but 
the main significant results are summarized. 

Chapter II of this text is devoted solely to the discussion of the pre- 
diction problem for a single stationary time series. We shall discuss both 
continuous and discrete series and shall illustrate them by a number of 
examples. We shall show (a) that a certain method which we present is 
an optimum linear method for minimizing a mean square prediction error, 
and that the only statistical parameter which is required concerning the 
time series on which it operates is the auto-correlation coefficient of the 
series. We shall further show (6) that the result is an absolute optimum 
in comparison with any alternative method, linear or non-linear, when 
the time series to which it applies is obtained as the response of a linear 

| We shall have something to say of non-linear resonators, but at present their 
theory is far from complete. 
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resonator to a Brownian input, as for example by the response of a 
linear electric circuit to a shot effect. Actually to avoid duplication of 
material we shall chiefly discuss (a) in Chapter II and (5) in Chapter ITI. 
We shall discuss the technical difficulties of the prediction problem in 
the presence of disturbing lines in the spectrum or of other non-abso- 
lutely continuous parts of the spectrum, or when the past of the function 
to be predicted completely determines its future; and in at least one case 
we shall observe how the validity of a long-time prediction changes with 
a change of the auto-correlation coefficient. We shall also consider how 
the prediction problem is altered when the predicting operator is built 
up as the sum of a denumerable or finite set of multiples of previously 
assigned operators. 

In Chepter HI we proceed to the filter problem, which we treat along 
the same lines as we use for the prediction problem in Chapter II. 
Except for the detailed changes of formulas, there is little difference 
between the steps taken in designing a prediction filter (or a filter with 
lead) and a predictor which has no specified filtering function. In the 
case of a filter with lag the design problem is a little more complicated. 
The filter with lag is the conventional type, because where a modest, 
uniform phase shift is not objectionable, more precise results can be 
obtained by a lagging filter than by a non-lagging filter. In the case of 
the lagging filter we are faced with the additional problem of simulating 
a lag operator e~** by a rational function.* The problem of realization 
takes one into the theory of equivalent networks as developed by 
Guillemin and others. In the design of the general four-terminal net- 
work, it is not enough to realize the transfer voltage ratio. There is also 
the question of the matching of input and output impedances with those 
of the circuits out of which and into which the filter works, Important 
as these problems are, they carry one far away from the general con- 
siderations of this tract and may best be treated elsewhere. 

With the restriction to the determination of the characteristic of a 
filter as a rational function, we may still say much that is concrete on 
the design of filters according to our method. The filter with very great 
delay, and its intrinsic error; the effect of lag on improving filter char- 
acteristics, and the method whereby this lag should be decided; the 
algebraic computation of the characteristic of a filter, and the basis for 
determining this—all these can be handled in terms of rational voltage 
ratios, without pursuing the theory of equivalent structures into any 
detail. We shall also have something to say on the filter whose function 

* The emphasis put on rational functions here is of course not a matter of mathe- 


matical formalism alone, but is of essential importance in the actual realization in 
the field of a finite electrical or mechanical structure. 
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is to detect a very faint message in the presence of a nearly overwhelming 
noise. 

Up to this point, we have been considering predictors and filters for 
use in time series consisting of single numerical sequences of data, 
whether these sequences be discrete or continuous. In Chapter IV, we 
proceed to consider multiple time series and, in particular, double time 
series, both from the point of view of filtering and from that of prediction, 
with greater emphasis on the latter point of view. Such series occasion- 
ally make their appearance in engineering applications of the theory, 
but they are most conspicuous in the statistical applications, both 
economico-sociological and meteorologico-geophysical, since in both 
instances the relative lead of one time series with respect to another may 
well give much more information concerning the past of the second than 
of its own. For example, on account of the general eastward movement of 
the weather, Chicago weather may well be more important in the 
forecasting of Boston weather than Boston weather itself. Accordingly, 
we give much prominence here to the discrete case of forecasting. The 
methods of this chapter, as might easily be anticipated, make much use 
of the method of undetermined coefficients and of the linear equations 
(in a finite number of variables) to which these give rise. 

Prediction and filtering do not exhaust the capacity of our methods. 
They may be applied whenever an ideally desirable linear operation on a 
statistically uniform time series is in fact not strictly realizable, although 
an approximation may be realized. Two problems of this sort are those 
of obtaining a derivative of the first or of higher order from data that 
have been corrupted by additive errors of known spectrum and known 
statistical relation to the message, and that of interpolating a continuous 
function between the discrete values of a time series of known statistical 
character. The first of these applications is of very practical use in the 
design of servo-mechanisms, where a derivative computed on too short 
a base of Aé will be so erratic as to be of no value, while a derivative 
computed on too long a base will discard much valuable information. 
The interpolation problem is interesting as leading to a formula identical 
in appearance with that employed for extrapolation. These matters are 
taken up in Chapter V, our last chapter. At the end of the entire discus- 
sion, we give a table of Laguerre functions. 

The unity of this book is methodological. It represents an attempt to 
use Fourier methods so as to take significant advantage of the way in 
which “before” and “after” enter into the translation group in time. 
It does not pretend to have drained this field dry. It is the hope of the 
author that those to whose attention it comes, and who find either new 
pure mathematical developments connected with the topics discussed 
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or new fields of application in which the methods may be useful, will 
bring them to his attention. This applies to any criticisms of the entire 
book, either in what concerns the validity of its mathematical methods 
or in the appropriateness and usefulness of its applications to engineering 
or statistics, 


CHAPTER I 


RESUME OF FUNDAMENTAL MATHEMATICAL 
NOTIONS 


1.00 Fourier Series 


The realization of the objectives which we have stated at the end of 
the introductory chapter will require a fairly elaborate mathematical 
technique. This technique is available in the literature and in particular 
in two books in which the author has participated. The first of these is 
his book, The Fourier Integral,* and the second (done in collaboration 
with the late R.E.A.C. Paley of Cambridge University) is entitled, 
Fourier Transforms in the Complex Domain.t As will be seen, the word 
“Fourier” appears in the titles of both books. We have already stated 
that the intrinsic invariance of our problems with respect to a translation 
in time, combined with their linear character, makes a recourse to Fourier 
theory inevitable. However, the conventional textbooks on Fourier 
developments devote much more time to what is to us the secondary 
case of the Fourier series than to the Fourier integral; and, even in the 
case of the Fourier integral, they fail to treat of its extension to functions 
which, instead of tending to zero as their argument becomes infinite, 
keep up the same average power level over an infinite time. 

Since this chapter is a résumé of the existing theory, it may be better 
for the reader who first approaches the subject to use it for purposes of 
reference only, proceeding directly to Chapter II, and when in the course 
of the argument a notion comes up which demands reference to a syllabus 
of formulae, to return to the appropriate part of Chapter I. It would 
have been perfectly possible to eliminate this chapter from the treatment 
of our subject, for those readers to whom an adequate mathematical 
library was available, containing the texts already mentioned. Since, 
however, in the majority of cases in which this volume will be of use 
such libraries will not be available, and since the books referred to are 
not sufficiently generally used in routine mathematical education to 
enable us to assume that any large fraction of the readers of this work 
will possess them as relics of their own mathematical training, we have 
no alternative but to present this chapter in full. 

* Cambridge, 1933. 


{ American Mathematical Society Colloquium Publication, Vol. 19, 1934. 
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We repeat that, owing to the very nature of the problems considered, 
this book makes extensive use of methods of Fourier or trigonometrical 
analysis, and that this is due to the fact that these problems are not tied 
to any origin or epoch but are invariant under a displacement of time, 
in the sense that, if we start a time series late by the interval of time T, 
we shall also start any prediction, or the result of any filtering, late by 
the same time interval T. This statement simply asserts the repeatability 
of our methods and is indeed a necessary condition for the existence of 
any scientific theory whatever. A scientific theory bound to an origin in 
time, and not freed from it by some special mathematical technique, is a 
theory in which there is no legitimate inference from the past to the 
future. If scientific investigation were a game with the world, in which 
all rules were subject to a future revision unknown to us, it would 
scarcely be a game worth playing. A dependence on starting time con- 
notes such a change in the rules. 

This mode of invariance under a translation of the origin in time is 
indeed shown by Newtonian laws of mechanics, the laws of heat flow, 
the laws of electrical flow, the Maxwell equation, etc. It is expressed by 
the fact that they lead to differential equations with coefficients constant 
in the time. 

Besides this general property of the objectives of this book, we shall 
here confine ourselves to linear problems of filtering or prediction. We 
are thus interested in additive classes of functions, which are not tied 
to any fixed origin in time. Here we deal with phenomena where added 
causes produce added effects. An extremely simple class of this sort is 
the set of functions 
aet 


for 
get (tt) — aet tet! 


Indeed these two properties, that of invariance and that of linearity, 
are completely diagnostic of the field in which a linear analysis into 
trigonometric terms e*' is appropriate. 

The simplest representation of a function by linearly additive trig- 
onometrie terms is that found in the theory of Fourier series. Let f(#) be 
a (possibly complex valued) integrable function defined over the interval 
(—7, +).f Then, formally, its approximation by a Fourier series will be 


> x et [toc az ~f®. (1.000) 


¢ Any interval may be chosen, but this is a convenient normalization of the desired 
interval. 
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This formal series need not converge everywhere, and, even if it should 
fail to converge anywhere, it may still be useful. If the square of the 
modulus of f as well as f itself is integrable, and we form the quadratic 


norm 
ia 


this will have the expansion 
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and since 
= ive owter 
dt = 2n; f (Hire 
—-« —r Ww 
= 0 if y is an integer other than 0, (1.0015) 
this expansion (1.001) may be written 
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This expression is by inspection a minimum when and only when 
1 i. F 
Gn = 5 f {Oem dt (-N<n<Q), (1.0025) 
TU 


where it should be noted that the value of a, is the familiar Fourier 
coefficient and is independent of N. This minimum remainder value will 


then be 
= 1 : iP —in 
Slo ra-2¥] [so ‘dt 


which is always positive as it is the integral of the square of a modulus. 
Clearly this minimum expression will decrease as N becomes infinite. 
That it will decrease to zero may not be immediately concluded but may 
be proved to be true. This fact is known as the Parseval theorem and is 
equivalent to the fact that a, as prescribed in (1.0025) minimizes the 
expression set up by (1.0005) as N > o. 


2 
. (1.003) 
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An extremely important mathematical lemma which is useful and may 
now be discussed is that of Schwarz. Let f(t) and g(t) be real, integrable, 
and of integrable absolute square over the interval —z to +7. Then the 
quadratic polynomial in 


fivotroora= [yorata fro at 
$22 i. " tg(t)]? dt (1.0035) 


is always positive or zero regardless of the value of A, and the equation 


fivorata [soo at [7 wr a=0 (1.004) 


cannot have distinct real roots. If we write the equation 
c+bv\+ a =0 


the discriminant b? — 4ac cannot be positive, and 


Fy, Og ax | < J "OP at i i (g(t)? dé. (1.0045) 


As an immediate consequence, even if f(i) and g(t) are complex, 


ce FOg@) at <| Ri { (t) |? ae f lo® pat. (1.005) 


This is the Schwarz inequality. 

Let us now apply the Schwarz inequality to a convergence problem of 
interest in the study of Fourier series. Up to the present we have been 
treating the Fourier series as a purely formal expression without any 
regard to whether it converges or not. We shall use the Fourier series 
and other similar trigonometrical developments to represent physical 
quantities as functions of the time ¢ or whatever other variable we indi- 
cate by ¢. Now it is obvious thet no physical quantity can be observed 
for 2 single precise value of t. No watch or other chronometer is accurate 
enough to isolate an instant, but can only isolate a small but positive 
interval of time. Thus all functions of ¢ are for the physicist averages 
over small ranges of ¢ rather than values at a precise point of f. Therefore 
it is important to discuss how such averages behave and how they can 
be expressed in terms of the formal series (1.000), Now we have shown 
that, if f(t) is integrable and of integrable absolute square, we shall have 


woe Lr = 5 Le i, (x)e™* de i 








dt 
= lim ey = 0. (1.0055) 


Noo 





FOURIER SERIES 29 


Thus, by the Schwarz inequality, if g(¢) is also integrable and of inte- 
grable absolute square 


is N -w 
lim ies ro - - = ein fier az| ot dt=0. (1.006) 


Noo 


That is, 
fi soa at = Jim 5- - P g(aye' ae f slayer* dx. (1.0068) 
In other words, the series 

fo2 = ke eit [lt@c dz, (1.007) 


while it has not been shown to converge, does in fact yield a convergent 
series when multiplied term by term with g(t) and integrated; and this 
new series converges to the integral of the product of f with g. In partic- 
ular we may confine our attention to a g(t) differing from zero only over 
a small interval. Thus all local averages of the formal series (1.000) 
converge to the corresponding local averages of f(t). As we have pointed 
out, this is all that we need to make a practical employment of the 
Fourier series for f(t). 
The partial sum of this series may be written 


N . 
= = em [ite dz 
oa sin (NV + 4)z 
=> fiero aie o (1.0075) 


Closely related to this, there is a quantity known as the Cesare partial 
sum, which may be written 


I - |n| int 3 e —ing 
FE (i- W e | Fae dz 
, ieee! i sin? Nx 
= Se feta ant de dz. (1.008) 
Both partial sums represent weighted averages of f(x) with the 


points in the neighborhood of x+# weighted especially heavily. In both 
cases the total weight of all points around the circle is 1, as we see by 


1 frsinW + $e 1 f*sin® 3Nz 
——— i —— 35 de = 1. «(1.0 
Qe sin $2 °* OGN Jae sin? gr . eee 


There is however, this important difference: the weighting for the 
ordinary sum assumes both positive and negative signs, with the total 
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absolute weight becoming infinite, while the weighting of the Cesaro 
sum is non-negative, so that the total absolute weighting is the same as 
the total weighting and is 1 in all cases. Thus the Cesaro partial sum of 
the Fourier series of a non-negative function is also non-negative, 

It will be seen that the coefficients of a Fourier series are given as 
integrals. When the function f(z) is as continuous, save perhaps for a 
finite number of finite jumps, the sort of integration used is that given 
in the ordinary textbooks. However, there are many cases in which it is 
desirable to represent in a formal Fourier series a function corresponding 


to the sum yr a,e™ for which we know only that = | ais? converges. 


In order to do this we must adopt an extension of ee usual noticn of 
integration. There is one notion of integration and just one which fits 
the needs of the case, and that is the definition given by Lebesgue. This 
is not the place for a general exposition of the theory of integration, 
but it may be said that Lebesgue’s form of the integral has all the 
ordinary properties which are desirable in an integral. For example, 
if two functions have an integral, then their sum has an integral, and the 
integral of the sum is the sum of the integrals of the two functions taken 
separately. If a sequence of functions tends boundedly to a limit func- 
tion, and if all the approximating functions have integrals, then the limit 
function has an integral which is the limit of the integrals of the approx- 
imating functions. A non-negative function, if it has an integral at all, 
has a non-negative integral. Again, an increasing sequence of functions 
which is such that the sequence of their integrals remains bounded tends 
“almost everywhere” to a limit function whose integral is the limit of 
the integrals of the approximate functions. 

Let it be noted that the expression ‘almost everywhere” means this: 
two functions that are identical almost everywhere differ by a function, 
the integral of whose modulus is zero, and may be considered for 
practical purposes as the same function. 

It is not only true, on the basis of this definition, that, if f(é) is inte- 
grable and of integrable absolute square, then 


I a int —inz r 
“ee [te dz| dt=0, (1.0087) 








but it is also true that, if a | Qn |? < , then there exists a function 


J( integrable and of integrable square sian such that 
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This function will be determinate “almost everywhere’; but as a single 
point is a set of measure zero it may be changed at any single point 
whatever. 

The theorem we have just cited is known as the Riesz-Fischer theorem. 
Strictly speaking, it is an immediate deduction from another of less 
specific appearance, known as Weyl’s lemma. This latter asserts that, if 
{fn(t)} is a sequence of integrable functions of integrable square modulus 
over (a, b), and 


lim f ” Lm(t) — fat) [Pat = 0, (1.0089) 


then there exists a function f(é), integrable and of integrable square 
medulus, such that 


lim f "1 4(t) ~ fal) [2 dt = 0. (1.0090) 


By applying this to the sequence 

fil) = S ave™, (1.0091) 
the Riesz-Fischer theorem follows. 

We shall find it convenient to write 

S@® = lim. f,() (1.0093) 

for 
> 
lim ri [7(t) — Salt) Pat = 0. (1.0094) 

Similar modes of writing will be used for the infinite or semi-infinite 
interval replacing (a, b), or a continuous panable replacing n. We shall 


write f® ie, = ann (t) for f(t) = = sre > GnPa(t). 


1.61 Orthogonal Functions 


The theory of Fourier series is but one chapter in the theory of sets 
of orthogonal functions. A set of functions ¢, (é) is said to be normal and 
orthogonal over (a, b) if it consists of integrable functions for which 


b 
f | en(t) |? dt = 1; (1.010) 
and if, whenever m * n, 


Ph " veakfleate) Ge m0 (1.011) 
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As in the trigonometric case, if f(t) is an integrable function of integrable 
square modulus, the expression 


r 


is minimized by putting 


2 
dt (1.012) 





fo ~ x meet 





b — 
a= f sOen@ at (1.013) 


Any set of functions ¥,(), integrable and of integrable square mod- 
ulus, may be used as the point of departure in forming a set of linear 
combinations y, of the ¥,’s which will be normal and orthogonal, and 
for which every y¥, is a linear combination of a finite number of ¢,’s, 
except at a set of points of zero measure. 

To illustrate the process of forming orthogonal sets, Jet us start with 
¥1(é), and let it be not equivalent to 0. Let us put 


¥1(t) 


Uf inora 


Taking ye(t), if it is not equivalent to a multiple of ¥, (2), we put 


gi(t) = (1.014) 
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ga (t) = (1.015) 
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If 3 is not equivalent to a linear combination of ¥,(f) and We(t), we put 


g3(t) = 
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and so on. 

A set of functions, whether normal and orthogonal or not, is said to 
be compleie or closed if there is no function orthogonal to every function 
of the set. If a set of normal and orthogonal functions ¢, (é) is not closed, 
and if f() is not equivalent to zero and is orthogonal to every function 
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of the set, we have of course 


ib ——- 
1D HE ald [se at (1.017) 


On the other hand, we always have for any f(é) 
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the set is not closed but can be enlarged by adjoining another function. 
Thus, if a normal and orthogonal set is closed and f(t) is any integrable 
function of integrable square modulus, then 


and if 
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The definitions of closure, normality, and orthogonality may be 
applied to functions over a semi-infinite interval (0, «) or the complete 
infinite interval (— 0, ©), as well as to functions defined over a finite 
interval. The one change which has to be made is that the class of func- 
tions which are integrable and of integrable square modulus must now be 
replaced by functions of Lebesgue class L? of integrable modulus over every 
fintie interval and of integrable square modulus over the infinite interval. 
For example, the function 1/1 + w* is integrable over every finite 
interval and of integrable square modulus over the whole range of w, 


still 
eS dw 
=F Oz 1.0195 
2 Vite” 


With this change, Weyl’s lemma holds for the infinite intervals, and the 
analogue of the Riesz-Fischer theorem holds for functions orthogonal 
over such intervals. The Schwarz inequality holds alike for finite and 
infinite intervals. 

A set of normal and orthogonal functions particularly important for 
our purpose is that obtained by orthogonalizing the set of functions 
x"e"*(0 <n < @) over the interval (0, «). These functions are 
known as the Laguerre functions and, when divided by ¢~*, as the 
Laguerre polynomials, We shall see at a later point how to compute them 
more expeditiously. 
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1.02 The Fourier Integral 


We now come to a theory pertaining primarily to the infinite interval. 
Let f(t) belong to L? over all t, and let 


1 ’ —twt 
F4a(w) = ae He at. (1.020) 


Then a theorem due to Plancherel § asserts that there exists a function 
F(w), likewise belonging to L?, such that 


lim f | F() — Fao) P dw = 0. (1.0205) 
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It further asserts|} that 


[ll F@ Pao = [7 [70 Pay, (1.021) 
and that 
2 
dt=0. (1.0218) 
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The function F(w) thus defined is said to be the Fourier transform 
of f(t). 

If now G(w) is the Fourier transform of g(t), since the Fourier trans- 
form is additive, F(w) + G(w) will be the Fourier transform of f(t) +- 
g(t); Fw) — G(w), that of f(t) — 9({); Fw) +74 (w), that of f(é) + 
tg(t); and F(w) — 7 G(w), that of f(t) — ¢g(t). Hence 


[L FOO + FWRC + GW)FC + GWT} do 
=f" sOFG +1078 + oOFH + oI) at; 
f[. FOFS - PWED - CWFC + 4W)TC)} de 
= [" sOFW - 1096 - oOFH + oaO) at; 
LL FOFES - FOIE + 16W)FO + CW)GC)) deo 
= [FOO - 10TH + ig OFG + 907} at; 


§ See Titchmarsh, Theory of Functions, p. 436, 
|| See Wiener, Fourier Integral, p. 196. 
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= [OFS + 4070 — 19 OTD + oWaD) at. (1.022) 


We take the first of these equations as it stands, the negative of the 
second, the third multiplied by 7, and the fourth multiplied by —7. 
We then add and normalize by dividing by four, thus obtaining 


f ” F@)G@) de = 4 ‘ fg at. (1.0225) 


It results from this that a set of normal and orthogonal functions of ¢ 
goes over, upon Fourier transformation, into a set of normal and orthog- 
onal functions of w. Similarly, t-closure goes over into w-closure. 


1.03 Laguerre Functions 


A.very interesting set of normal and orthogonal functions of w is the 
set 





Ree ae ey ee 
eee ie ty = 0,1, x03), (1.030) 
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° 1 °- dw 
fF [ne Pao == fs ar ae (1.0305) 


and, on the other hand, if m # n, 


~*~ 1 o 61 psi rr) aa 
ie In{w)ln (wo) dw = ac ae G+ mPa” =0. (1.031) 
This we establish by Cauchy’s theorem, since one half-plane of the 
integrand is free from singularities. 
Let us now consider the function of t which is 0 for negative arguments, 
and which is ¢"e~* for positive arguments. Its Fourier transform is 


1 f Pe n} 
—= Leer dt = 1.0315 
V2 Yo V2n (1 + iw)" ( ) 
Thus 1, (@) may be expressed by the expansion 
kgs Clo) Sa Reene eeene ees ake Oar anes 
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! 
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kl(n — kY) re) (-1) . (1.032) 
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Furthermore, J,(#) will be the transform of the function L,(£), which 

latter function will be 0 for negative values of-, while it will assume 
the value 

ontin Q-tn, ‘ n (- 1)* 

n ee CO A as aoe aaa Seen cs Ne 
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Lene 2-1} (1.0325) 
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for positive arguments. 

With the aid of these functions, it is very easy to give an approximate 
representation of any function of class L?, over the interval (0, ), in 
terms of a linear combination of functions which have rational Fourier 
transforms. The functions L,(t) are known as Laguerre functions and 
are the products of e~ with polynomials known as Laguerre polynomials. 
Tables of them are given in Appendix A. 


1.04 More on the Fourier Integral; Realizability of Filters 


Toreturn to the Fourier transform, properties determining the smooth- 
ness of a function generally correspond to properties determining the 
behavior of its Fourier transform at infinity. The Fourier transform of 
—itf(t) is F’(w), and the Fourier transform of f’(¢) is i F(w). In gen- 
eral, a function and its Fourier transform can not be simultaneously 
very small at infinity. More precisely, if for all ¢, 


[f@) | < const e*?, (1.040) 
and if for all w, 
| F(@) | < const e~*7?, (1.0405) 
then 
f(t) = const & “?, (1.041) 
Further, if for all ¢, 
| f(t) | < const (1 + )e“*/?, (1.0415) 
and if for all w, 
| F(w) | < const (1 -+ w)e78"/?, (1.042) 
then 
SQ) = P(e", (1.0425) 


where P(é) is a polynomial of degree not exceeding n. If f(t) vanishes 
for negative values of t, the integral 
° [Jog | F() | 
7 | PE a (1.043) 
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must remain finite. Again, if H(w) belongs to L? and 


f Lop HO) a, © co, and H(w)>0, (1.0835) 
-7 L+w 
then there exists a function F(w), with H(w) as its absolute value, which 
is the Fourier transform of a function f(t) vanishing for negative values 
of ¢. 

This last fact plays a very important part in the theory of filters. It 
states that, in any realizable network whatever, the attenuation, taken 
as a function of the frequency w, and divided by 1+ w*, yields an 
absolutely integrable function of the frequency. This results from the 
fact that the attenuation is the logarithm of the absolute value of the 
transform of f(#) which vanishes for negative ¢; or, in other words, 
because strictly no network can foretell the future. Thus no filter can 
have infinite attenuation in any finite band. The perfect filter is phys- 
ically unrealizable by its very nature, not merely because of the paucity 
of means at our disposal. No instrument acting solely on the past has a 
sufficiently sharp discrimination to separate one frequency from another 
with unfailing accuracy. 


1.1 Generalized Harmonic Analysis 


The impedances or admittances or voltage transfer ratios of electric 
circuits are in many cases quantities possessing well-defined Fourier 
transforms. The messages transmitted by a communication circuit, if 
we idealize them, as we here do, in such a way that they are not tied to 
any origin in time, do not have such transforms. To cover these, or even 
to cover the analysis of sums of trigonometric terms of incommensurable 
frequencies, we need an extended concept of Fourier analysis. The author 
has succeeded in doing this, § and what follows represents a résumé of 
his results in this direction. 

Let us start with a function f(t), for which 


te fee 
lim 57 “4 “(@ [Pt = A. (1.10) 
Then it is also true that 
fies < Le | t) 2dt= A 1.105 
poe OT J-TH. $0 bis (1.105) 


4 Acta Mathematica, Vol. 55, 1980, p. 273. 
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To prove this, let us observe that 
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1 —s 
tim 35 J, [IO Phat = 4. (1.12) 
ai for the ahi bound of oscillation at ~, 
tim © 14@ Pat - ze f\1 Pal <s4-a =o 
(1.125) 


from which our statement (1.105) follows at once. 
Now let us assume that 


tim ef IC+ AO at = of6) (1.18) 
exista as a finite limit for every real +. Then, if for 8 finite N we define 
N 
g(t) = p> asf(t + 13), (1.135) 
it follows that ; 
Tim of, 10 Pat (1.14) 
will always exist and be finite. In particular, both the real 
tim ef let des Pra (1.148) 
and the complex 


lim a f “| fete) &ts© Pat (1.18) 
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quadratics will exist and be finite. Furthermore, the complex argument 


t+ VO =F lse+)D+s0? -— set -F0 /? 
+ilft+7) +2¢@ |? — sets) — if}. C151) 


Thus the integrals 
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and as before 
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II 


Again, by the Schwarz inequality, 


oa fet nO a < <a file rae f so Pat 


(1.153) 
Hence 
e(r) S ¢(0). (1.154) 
Thus in the vicinity of zero 
¢(O) = lim g(e). (1.155) 
«0 
An example such as 
SQ =e (1,156) 


will show that the equality cannot in general be used to replace the 
inequality; for here 


: f ' 
et Ys = — t(tt €)*—-itt 
v(0) = 1; fe) oes at J_r® dt 





r ‘ 2" 
= lim te ei cia di (1.157) 
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In those cases in which the function ¢(é) is discontinuous at ¢ = 0 
there is a discrepancy between two ways of measuring what we may con- 
sider the total power of the motion f(t). Of course, »(0) is a measure of 
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the total power of f(t); but, if we attempt to specify that part of the 

power which lies between frequencies —A and A and let A tend to ~, 

we find the limit which we obtained will be the same as lim ¢(e). In 
—0 


other words, if y(t) is discontinuous at the origin, then there is a portion 
of the energy which does not belong to any finite frequencies, and which 
in a certain sense we must associate with infinite frequencies. In the 
example given we have a function whose oscillation becomes of higher 
and higher frequencies the further we recede from the origin. It is 
therefore natural to consider a part, and in this case the whole, of the 
energy to be at infinite frequency. For the purpose of our future work, 
we wish to exclude this case, and correspondingly we wish to make 
g(t) continuous at the origin. That is 


¢(0) = lim ¢(e). (1.158) 
<0 
Under all circumstances, 
1 ft = 
lo(r) — ole + 2) | = | im af Gt — Heb et NF) at 
(1.159) 
and by the Schwarz inequality 
- A z - z x 
< tim Jaa feta ~ft+trt+6| af 170 dt. 
That is, using R to denote “real part of” 
< Vie) + 2R{y(e)} + e(0)]e(0). 


Thus, if y(t) is continuous at the origin, the difference le(r) — e(e + 7)| 
tends to 0 as e tends to 0, and ¢(t) is everywhere continuous. This case 
is so much the most important that we shall assume it without more ado 
in all the practical applications we make of g(t). 

Let us proceed to find a weighing function K(¢) which is of limited 
total variation, and let 


foe +) KG (1.16) 
exist. This will have the value 
a ——— 1 Ye iD d 
L dK(r) Less aL itt r+ a)f(d) dt 
2 ———— 1 T+r ae 
= [RO bo oe Si 14+ ICH at 


-f{" iG lim +f iis Yes d. (1.161) 
— —* T- 0 2T J-r 5 gee : 
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There are many conditions under which the limit sign and the first 
integration may be interchanged. This will be the case, for instance, if 
K varies only over a finite interval, or if f() is bounded, both of which 
occur frequently in practice. We shall here assume that one of these 
justifying conditions is fulfilled. Then 


J ole +) aK = tim oa J. Ko f- fit + afd — 7 at 


= lim 55 aS. flt + @) i. J@—ndkG@ dt. (1.162) 


In a similar way, and under similar conditions, 


[OG [" e@ - 0) ake) 
e+. « a r oe 
= [ G [7 aK) lim on f tr fat 
. ~ T+r 
“ E aK@) f_ aK) im at. ft- ofl — 1) at 
eo ° Fs a 
i dK@ f © aK (@) Jima sp f f(t of= a) dt 
lim nf ko f" ake) Lite eo 


lim nf ral j(t~ 0) dK(o)| « 


It 


i 


(1.163) 





It is easy to show that, if o(0) exists, or if even the considerably looser 
condition lim y(e) < © obtains, then 


0 
2 
‘x sak - i <@ (1.164) 


Accordingly, we may show that the required generalized Fourier trans- 
form given by the limit-in-the-mean 


G(w) = Lim. =| t+ fe a 


1 1f()C1 — e**) 
ee a dt (1.165 
V29 ut ( ) 
exists. This would be the integral of the Fourier transform of f(t), with 
an appropriate choice of constant of integration, if such a transform 
existed. 
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A very important theorem concerning the function G(w) is that, if 
v(r) exists and is continuous, it is identical with the quadratic average 


ae ee 
e{r) = eS ue | GW + €) — Gw — «) [Pe do. (1.166) 


From this it may readily be deduced that there exists a monotonically 
increasing function A(w) of finite variation, generating as a transform 


1 : ier @ 
g(r) = oat. e**? dA(w). (1.167) 


This function A(w), which we shall know as the integrated spectrum or 
integrated periodogram of f(t), may be recovered from ¢(z) by the process 


safe fe 


ae pee = Deane 
Vn ar 


If A(w) is modified by an additive constant in such a way that it 
vanishes when w > — ©, it then represents (on an appropriate scale) 
the total power in the spectrum of f(t) between w = — ~ and the frequency 
w. The word “power” is used instead of “energy,” because the phe- 
nomenon represented by f(t) is a continuing one, having a finite mean 
over all time, instead of a transient one, having a finite integral over all 
time. This power is positive over every region of frequency, and con- 
sequently A(w) is monotonically increasing, but this increase may occur 
in several ways. If 


= A(w) + const. (1.168) 


{® = Xhe™, (1.169) 
1 
then 
e(r) = tim <2 an = x Steet Gittar—eeo dt = 5 lf |? ¢ ase 
jelke=l i=l 
(1.170) 
and 
Re lifz>0; 
aw =f Xf; |? sen @ — 45), wheresgnz=;, Oifz=0; 
j=l -lifz <0. 
(1.171) 


Accordingly, A(w) thus defined is an ever-increasing step-function. A 
slightly more general case is that in which A(w), though not a function 
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having a necessarily finite number of steps, has at most a denumerable 
set of points of increase. We shall see that the Brownian motion, among 
other phenomena, will give rise to a A({w) which is the integral of its 
averaged positive derivative, where we may accordingly write 


1 x iwra? 
ob) = Se él ” e¥*A" (0) do. (1.172) 


Another type of A(), known from actual instances,* is continuous 
everywhere but grows over a set of points of Lebesgue measure 0 and is 
not absolutely continuous. Finally, A(w) may be the sum of parts of any 
two of these types or of all three. The type which is most significant in 
prediction problems is that in which A(w) is the weighted integral of its 
average derivative. 


1.18 Discrete Arrays and Their Spectra 


Parallel to this theory of the harmonic analysis of continuous phe- 
nomena not changing their scale with the increase of time there runs a 
theory of discrete phenomena. We start with a time sequence fn, for 
which 


1 
Om = Essie ING , ae Sntnts (1.180) 
exists for every m. As before, 
Ym S 0, (1.181) 
and 
*, li is 
z a <0, (1.182) 


where the }’ indicates that the zero term is missing. Accordingly, the 
function G(w), again given ee the limit-in-the-mean 


pie saan 1 
Ge) = Li : a 
) esa oe = Qn re = ip 2Qr 


exists. We Pee likewise 


fot) (1.183) 


me = tim f |G +e) — Gl —) Pe" dw (1.184) 


0 4 


and we may write 


eee TN) 


*See Wiener and Mahler, Spectrum of an Array, Journal of Mathematics and 
Physics, Vol. 6, 1927, pp. 145-163. 
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where A(w) is a monotonically increasing function of limited total vari- 

ation over (—7, 7). As before, A(w) may contain discontinuous, abso- 

lutely continuous, and non-absolutely continuous parts. We shall have 
as before 

Yee a oe t. (1.185) 

et a a + ee =A(w)+ const. (1. 

In both the discrete and the continuous cases, the jumps of A(w) 

represent what we think of physically as lines in the spectrum. Con- 

tinuous spectra are familiar in spectroscopy, although the non-absolutely 

continuous variety does not seem to have put in its appearance there. 


1.2 Multiple Harmonic Analysis and Coherency Matrices 


Besides the problems concerning a single time series, spectroscopy 
has its problems of polarization, coherence, and the like, which concern 
several time series simultaneously. If f; (2), fo(é), ---, fn(é) are an array 
of functions, the generalization of ¢(r) will be the double array 


~ gt — 
ens) = lim oe fit + na@ at (1.20) 
Here by symmetry 
piz(t) = 923(—7). (1.205) 
We shall be able to write 
1 ae 
enls) = Se fe dhnlo, (1.210) 
where 
Ajz(w) = Ais (@); (1.215) 


or, in other words, the matrix {| A;,(w) || is Hermitian, as is the matrix 
|| A’;z(@) ||, when the A, are absolutely continuous. The form 


LA nlo)aae (1.220) 
ik 
is in general a positive definite Hermitian form. The problem of deter- 


mining the principal axes of symmetry of this form reduces to the 
problem of determining linear combinations 


git) = z= ajx fx (t) (1.225) 
3 
such that if 


1 ft — 
vine) = Tim 3 fast Oat (1.230) 


SMOOTHING PROBLEMS 45 


and 
vale) = ee feral); Ma(— ©) = 05 (1.288) 
then 
Mi(o) = ‘ te a % (1.240) 
We shall have 


|| Mf se(o) |] = |] ase El + |] Atm Co) |] + |] Oma II (1.245) 


We shall call the matrix || Ajx(w) || the coherency mairiz of the set 
{f;(t)}. If all terms outside the principal diagonal vanish, and only then, 
the various time series f;({/) are incoherent. The study of the state of 
polarization of light reduces to the study of the coherency matrix of two 
components at right angles. 

The theory of coherency matrices is almost the same for discrete time 
series as for continuous time series. 


1.3 Smoothing Problems 


For both discrete and continuous time series, in view of the additional 
chapters of this book, a certain theory of approximation is desirable. 
The manipulations to come later on are much facilitated if y; is an 
expression which vanishes if | j| > N, or if o(r) is a function with a 
Fourier transform which ts rational. In both cases, A(w) must be mono- 
tonically increasing, or A’(w) positive for all (w). In the discrete case, 
this is secured by taking as our approximate ¢; the expression 

oi( - lil) where (|j| <N); 
i= ? N a ? (1.30) 
0 where (|j| >); 


which will make the approximate A’(w), or 





A’ (w) & es = Yn (1 - In) Bian (1.81) 
WV 2x —N N ; 
non-negative. In the continuous case, we form 
A’ (w) “> E ein tan-! t: ‘Ae (u)e~™* tan-!'« 2 du (1 32) 
= ra a 1 + ut 'y * 


or perhaps 
A’ (w) (1 + w?) ~- = ni i A’ (ue 82" dy, (1.33) 
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We then represent A’(w) or A’ (w)(1 + w*) by the approximate Fourier 
development 


A’ (w) m= = (1 7 tl) gtintannte ha c Al (w) etn tant du 


1+’ 


or 
A’ (w)(1+07)& ! = (1 - =) oe ce A’ (ue? 22" du. (1.35) 


As before, this will be ever positive, and, since 
: 1 + to\* 
Zin tan . ’ 
€ G — =) (1.36) 
it will also be rational. The same convergence factors which render 
A(w) increasing will render > £ A’ jx(w)a,;2, positive as an Hermitian 
j=l k=l 
form. 


1.4 Ergodic Theory 
In order to produce examples of functions or sequences for which the 
functions or sequences o(r) or yg; exist, let us appeal to ergodic theory, 
as presented in the Introduction. If, in particular, 2 is a set of elements 
P, and T is a measure-preserving transformation of = into itself, or if, 
more generally, T* is 2 measure-preserving group of such transforma- 
tions, for which » is real, and 
TYT#P) = TO p, (1.40) 


then, if F(P) is a function of class L? throughout this set, we have that 
in the discrete case 


1 N 
= lin —- F(PSOP)F(T*P 
€m J@ 471, ( )F(T*P) 


N 
= ij —__—— (m+n) n 
i IN 1 ye P)F(T"’P), (1.41) 


or for the case of continuous phenomena 


ae ee f “ F(DEP\ ECP) at 
* Avo 


ae 


en Tien, if “ P(THP)F(T'P) dt, (1.42) 
ace 2A J-A ‘ : 


In the first instance, the particular = set defined assures the existence of 
Ym and (7) throughout the set P (except those of Lebesgue measure 
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zero) for a single m or r. As we have indicated in the Introduction, the 
same statement may be shown to hold simultaneously for all m or 7, 
and the set of exceptional values of P remains of zero measure. In the 
continuous case, moreover, we have as before 


lim g(e) = 9(0) (1.43) 
0 


almost always. Thus the theory of generalized harmonic analysis which 
we have here stated is applicable to almost all orbits in ergodic theory. 
If the transformation T or the transformation group 7” is metrically 
transitive (or ergodic, as it is now the fashion to call it), we shall have 
(almost always) that ¢,, is given by the integral throughout the set 2 


= f rcrpyF®) av, (1.44) 
when discrete; and when continuous, 
o(s) = f F(T'P)F®) av, (1.45) 


This relationship existing between the generalized harmonic analysis of 
this chapter and ergodic theory renders generalized harmonic analysis 
a most useful tool in communication engincering and in the study of 
time series. 

One extremely important fact in ergodic theory, which we have already 
indicated as resulting from invariance of the translation group in time, 
is that, if G is integrable, then almost always 


‘ n 1 n 

aa V = = G(T*P) = oo oral NI = G(T"P). (1.46) 
This allows us to identify averages made on the observable past of a 
time series with averages to be subsequently obtained from the now 
unattainable future. It is at precisely this point that the ensemble of 
time series, as contrasted with the individual time series, becomes 
important. This step (legitimate for the enseroble) is not legitimate for 
the individual series. It is this step which constitutes the logical process 
of induction. 


1.5 Brownian Motion 


Now let us turn to the theory of the Brownian motion. The funda~ 
mental formula behind this theory is that 


—y? _(z-y¥ 


e me) dy. (1.50) 


2? 
2ate/ — 


Vv wo = is 2rv ty + te was 
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This means that, if the quantity X has a Gaussian distribution, so that 
the probability that it lies between x and x -++ dz is given by 


1 (-i) 
——— e\ */ dr (1.51) 
V 2a : 
and if the quantity Y, completely independent of X, has likewise a 
Gaussian distribution, so that the probability that it lies between z and 
az + dz is 





LG) 

eX 247 dx 1.52 
a ’ (1.52) 
then the probability that the sum X+Y lies between xz and x + dz 
will be 


=_2 


1 (sain) 
eg AE Yop, 1.53 
V2r(ly + &) ee 


Now, in the Brownian motion, a particle moves in such a way that 
the distance traversed by the particle in one interval of time has a dis- 
tribution independent of the movement of that particle in any non- 
overlapping interval of time and dependent only on the length of that 
interval of time. If the distribution of the X-distance traversed by the 
particle in an interval of time of length I is Gaussian, it can accordingly 
have only the form 





1 (3 
—_.gkaet? ag (1.54) 
V2rpl 


Let us normalize this by taking p = 1. The author has shownf that, if 
we take x(0) to be 0, and we regard the probability that 


<r) —2(4) <ceotdr (4 <t) (1.55) 
as being 
1 aoe 
6 MES dp (1.58) 
V2r(t2 — ty) 


then it is possible to map sets of functions z(é), simultaneously satisfying 
conditions of the type 


QS 2h) Sb; a2 < alte) S des ++, Gn S A(tn) < dn} 


bh Sh'S-++ St, (1.87) 
upon measurable sets of points on the ine 0 < a < 1, in such a way that 
the probability of the simultaneous occurrence of any such set of con- 


t See Proceedings of the National Academy of Sciences, Vol. 7, No. 9, pp. 253- 
260, September, 1921. 
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tingencies becomes equal to the measure of the set of corresponding 
values of a. In other words, if we write z(t, a) for the z(d) corresponding 
to each a, and put tp = 0, we have, as a measure of the probability of 
each coincidence, the product 


1 by 2 
AML SS Eee f dx, . f ax y 
21 a2 


(2)? IN (t, — t1)? 
1 
y Ge = en) 
- «fat, exp (-E == wea teie) (1.575) 


for the set of cases where {a, < x(t,a) < bi,+-+,@n < 2(tna) < bn}, 
in which normalization is indicated by the partial product divisor, and 
the symbol exp designates the exponential function of a complex vari- 
able.{ In this manner we may determine the integral (or average) of 
other functions of a, determined as functionals of z(t, a). This may be 
extended easily to include values of ¢ on an infinite range from —o 
to oo. Here x(—t,a) and z(t, a) are taken to be projections of inde- 
pendent Brownian motions. In particular, if f;(t), ---, fn(é) are a set of 


differentiable functions of class L?, and such that _f” | fx(t)|at < ©, 


and we define 


JE nwartay=- [7 earn’ @at, (2.58) 
then 


1 n ° ry 
f aall fi iO aoe) = ET Hom ae (0.583) 


where the partial product II is one in which every number from 1 ton 
inclusive appears just once as a j ora k, and the sum ©& is taken over 
all such products. The result is 0 if x is odd. 

The transformation of x(t, «) into 


z(t+2,a) — 2, a) = 2(6) (1.585) 


will generate a transformation of a into 8 which will conserve measure on 
the line (0, 1). Such a group as is formed by all these transformations 
for all values of d will be exactly 2 case in which we may employ the 
ergodic theorems. It may be shown, as a result of the independence of 
non-overlapping intervals of ¢, that this group of transformations is 


$¢See Whittaker and Watson, Afodern Analysis, p. 581. 
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ergodie or metrically transitive. Thus almost always 
1 T n ° « 
et f aI J fell + 7) dx(,0) = TV - fiOfalt) at. 
To @ 2T J-r 1 ¥-2 —* 
(1.587) 


This gives us the spectrum of the response of a linear resonator to a 
Brownian input, since if 


jf = i ‘ f+ 7) dzlt, a), (1.590) 
then ¢(r) will almost always be 
or) = f se+ noWae, (1.593) 
and 
{ pry, ° -_ 
nGa = sat i ema f" 9+ na@at 
ey eae s ” sw@etat| (1.595) 
= T Von im g - 








In words, the response of a linear resonator to a unit Brownian motion 
input has the same distribution of power in frequency that its response to 
a@ single instantaneous pulse will have as a distribution of energy in fre- 
quency. 

It will be seen upon computation that the spectrum of 


ne " g(t-+ 1) de(r, «) (1.597) 


1 - . 
will be dependent only on = a g(ée** dt |. From this fact alone, 
Tv-—2 
it does not necessarily follow that all the statistical parameters of the 
distribution of a4 g(t + 7) dz(r, ) will depend only on the spectrum 


of the function, but that such is actually the case may be proven by 
(1.587). These functions generated by the response of a linear resonator 
to a Brownian input have their spectrum as their sole statistical param- 
eter. In general, a function with a spectrum not belonging to such a 
restricted class has many other independent statistical parameters as 
well. This is one of the principal reasons why our theory, which secures 
only an optimum Jinear filter or predictor in the general case, secures an 
absolutely optimum one in this special class of inputs. 
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1.6 .Poisson Distributions 


Another interesting type of random distribution is that of Poisson. 
Let us assume that on the infinite line — «© <x < © the probability 
that a given segment of length 7 does not contain a bullet hole is 


ee (1.60) 


and that these probabilities are independent for non-overlapping 
intervals. It may be shown that these assumptions are consistent, and 
that the probability that a segment of length / contain exactly » bullet 
holes will be 

(AD 


y! 





(1.61) 


As in the Brownian case, the contingencies of bullet-hole distribution 
may be mapped on a probability interval (0,1) of a parameter a. 
Only 2 set of values of a of zero measure will correspond to the cases in 
which some finite interval of the line contains more than a finite number 
of bullet holes. If the X-coordinates of all the bullet holes are x,(a), 

+, Zn(a), -- +, and if we consider zx, (a) and integrals with respect to a, 
we shall have 


ff Eieloda = Af” (2) ae (1.62) 
2 oo = : 


whenever f(x) belongs to the class of Lebesgue integrable functions over 
(—o, 0). Furthermore, if f(z) and g(x) are Lebesgue integrable and 
of class L? on the infinite line, 


Sf Tstewl1 E dlen(edlda = Af” sedate) dx 
+A? f sea)de f” oe) ax. (1.63) 


The change of {r,(a)} into {z,(a@) + A} generates a measure-preserving 
transformation T* of the line 0 < a < 1 into itself, and, as before, it 
may be shown that this transformation is ergodic. Thus, if f(x) belongs 
to L* and also to the class of Lebesgue integrable functions, we shall 
have for almost all sets z, (a), 


lim 57 al. E fies a) ++ 1] EHemla) +H at 





2 
= Af se ete) de 4 at Se ex| (1.64) 
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The spectrum of 3 fiz, (a) + 4, as a function of ¢, will consist in a line 
n 


2 
at w = 0, together with a continuous 


of intensity vieA| fi i@a 
spectral distribution for which 





2 


A'(w) = AV2r 





(1.65) 








1 ° : 
Vz f : g(te*** dt 


Finally, let us consider the transformation of the infinite product 
space 


50 Sm4n S1,°°5 05% S1,-°505%,5 1,-++ (1.66) 
constituted by transforming zz into 241. This leaves invariant the 


infinite product measure, which is isomorphic with respect to Lebesgue 
measure. It is, furthermore, ergodic. Hence almost always, if 


x | f(z;) |? <%, 
‘ 1 ay, ; fs cn amae 
anid NGI ey = F(Zj4n4m)f(Zipm) = X f(ttn)f(). (1.67) 
This gives us as the spectrum of 


Es Cin) (1.68) 





that determined by the function 





A’(w) = a x f(xje*" : (1.69) 


1.7 Harmonic Analysis in the Complex Domain 


Let us now turn from ergodic theory and the real aspects of harmonic 
analysis to the complex aspects of that theory. These are quite simple. 
If 


f ” |$@) Pe**ae < const (1.70) 

when z varies over the range x1 <  < 2, then the analytic function 
Getiy) =f" sooo at (1.71) 

is defined over the range 2} < x < 2, and uniformly over that range, 


fi i$ | Ge + iy) |? dy < const; (1.72) 


HARMONIC ANALYSIS IN THE COMPLEX DOMAIN 53 


and, vice versa, if G(z + zy) is any analytic function satisfying this 
last condition, we may represent it in terms of f as stated, while f will 
itself be subject to the condition first stated. If 


2 | f(t) |? e-7* dt < const (1.78) 


when z varies over the semi-infinite range 71 <x < o then the analytic 
function 


Geta)= f soerermrd (1.74) 
is defined over the range 7; < x < and uniformly over that range 
if "1 f(t) |? 24" at < const. (1.75) 


This can be the case only if f(t) vanishes for negative values of ¢. The 
transition from G(x + zy) to an f(t) of this sort is made as before. If 
the range of zis —~ <x < 2o, f(t) vanishes for positive values of ¢. 
In each case, G(z + zy) is uniformly bounded for any range of x interior 


to that for which 4. ‘ | £@) |? €?** is bounded. 


Although we do not make explicit use of it in this book, the theorem 
that a bounded function of a complex variable is a constant plays a 
very important role in our researches, It is this theorem which permits 
a sufficient restriction on the behavior of a function in each half-plane 
separately, to determine the function completely. The technique of 
determining a function by subjecting it to separate restrictions in the 
two half-planes is the basis of our solution of the fundamental integral 
equation to which our predicting and filtering problems lead. 

Finally, let us say a word about the factoring of a function (defined 
along the real axis) into the product of two analytic functions, each 
free from zeros and singularities in one half-plane. Let $(w) be afunction 
of the real variable w, and let 


@ | log | @(w) || 
i 1 -+- w? des < @. (1.76) 
Let us put | | 
1 log | (w) ois q 
= fo ifw — (u + 2)) ai = E(u + w). (1.765) 
Then the function 


E(u + iv) (1.77) 
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will be analytic in the half-plane » > 0, and its real part 
1 i‘, elog | #(w) elog |S) | 
ee 
(-u)?+é 
on the line » = ¢ > 0 will converge in the mean over every finite interval 
to log | (uz) | as «0. Let us put 
W(u + iv) = eft), (1.775) 
Then ¥(u + iv) will be free from zeros and singularities in the lower 
half-plane, as will be ¥(u + zv) in the upper half-plane. If we put 
V(u) = Lim, V(u — te), (1.78) 
«0 


the limit-in-the-mean being taken over an arbitrary finite interval, then, 
for real w, we shall have almost everywhere 


(w) = | ¥(w) |?. (1.785) 


Clearly, if (w) is an even rational function, real and positive on the 
axis of reals, we may factor it into the product 


$(w) = ¥1() - Y2(), (1.79) 


where all the zeros and poles of ¥;(w) lie above the axis of reals, and all 
the zeros and poles of Y2(w) lie below the axis of reals. It may be shown 
that if 


2, 


Wi(w) = c¥(w), Ve(w) = (1.793) 
and also | | | e 
° Jlog | ¥(w) 
'e ee = 0, (1.795) 


then in fact no W{w) free from zeros and singularities in the lower half- 
plane exists, such that over every finite interval of the real axis }(w) is 
the limit-in-the-mean of 


Uo — ie) - Vo + te) (1.799) 
as e— 0. 

We shall see later that, when the function $(w) is not factorable as in 
(1.799) or, what is equivalent, when (1.795) holds, the future of the 
function f from which $ is obtained is determinable completely in terms 
of its own past, A simple example of the sort is when f is analytic. In all 
practical cases “the auto-correlation coefficient of a message is not com- 
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pletely determined by its own past. If it were so determined, then at no 
period in the message would it be possible to introduce new information. 
Thus the case of (1.795) is in a certain sense singular and is so described 
by Kolmogoroff in his paper Interpolation und Extrapolation von 
stationdren zufalligen Folgen, Bulletin de l’académie des sciences de 
U.S.S.R., Ser. Math. 5, 1941, 


Cuapter II 


THE LINEAR PREDICTOR FOR A SINGLE 
TIME SERIES 


2.01 Formulation of the Problem of the Linear Predictor 


The present chapter will be devoted to the study of what is statistically 
the simplest case of prediction—that for thesingle time series. In general, 
the time series we study will be complex-valued. Certain formal advan- 
tages justify this generality; nevertheless, in all practical applications, 
we shall be concerned with the real-valued case. A similar situation 
obtains in the field of electrical engineering, where the classical theory 
treats alternating voltages and-currents, for reason of formal simplicity, 
as real parts of fictitious: complex voltages and currents. 

In the first instance we shall discuss the.continuous time series f(é), 
and at alater stage the discrete series. The “norm” or absolute quadratic 
‘average of such a series will be the mean square of its absolute value or, 

-in explicit form, 


tim oe f(s Pat 


We have said that the methods of prediction contemplated are to be 
linear, invariant with respect to the choice of an origin in time, and 
dependent only on the past and present of the function under investiga- 
tion. Examples of operators capable of making such predictions are: 


The derivative i; 
The ordinary integral - : S(t — 1)K(r) dr; 
The Stieltjes integral a S(t — 71) dK(r); 


N o 
Theintegralinvolving higherderivatives f(t — 1) dK,(z). 
y>1 “0 


It will develop that the relevant statistical parameters of time series 
will (under these circumstances) be confined to the auto-correlation 
56 
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coefficient 


(7) = lim 55 al. f+ DFO de. (2.011) 


We concern ourselves only with the prediction of ensembles for which 
nearly all functions have the same auto-correlation coefficient. 

Now let f(é) denote a bounded time series for which g(r) exists, and 
let ¢(r) be continuous in the vicinity of r = 0. Let K be of finite total 
variation, The expression f(¢ + a) will denote the value of f at a period 


a units of time later than é, while f : S(t — r) dK (r) will denote the 


result of applying to f an as yet imperfectly determined linear operator 
on its past. Thus, from the point of view of the theory of least squares, 
as has been indicated in paragraphs (0.7) and (1.1), 


an at. 


gives an estimate of the extent to which the operator fails to predict the 
future value of f(t) after a lead of a units of time. By our lemmas. 
(1.161, 1.163), this may become 


at 





fia) — f° fe) akO) 





bm Hp fi \m+o- f° 1- nexe) a 


= = wat Ss [f(t + a) |? dt — 





mt fin ar a. fltte)dt i jG=7 ak) | + (2.012) 
ae =. dt a: fa—pdkw 1 S(t — 0) dK(e) 
~ (0) - aR ff etn aK@| + [F&O [- ake - 0). 


2.02 The Minimization Problem 


The formal minimization of expression (2.012) is obtained by adding 
to K(r) the expression «(8K (r)] differentiating (2.012), with respect to ¢, 
equating this derivative to zero, and then allowing ¢ to approach zero. 
As a result we get 


£ ‘ Ec ie aa a ” g(r — @) dK (o) Jax (r) =0 (2.0205) 
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for all admissible 5K (c). Formally this leads to 
oa den nS lr ~ 0} dK(c) for +> 0. (20207) 


However, in order to show that (2.0207) gives us a true minimum rather 
than a mere stationary solution, we proceed as follows: we put 


dak’ = f gle iejddtey tor xs, Ost) 
Then 
9(0) — ar | {~ ola + 2) aK) | + [RO [O° ak @ele - 0) 


= (0) — [ola + 1) a0) 
+ [aK - Oi [~ aK) — Qe - 6) 
= (0) Eee 


+ lin sf 


Since the last term is non-negative, the whole expression (2.022) is 
greater than or equal to 


00) — [° ola +1) de, 


at| [" se — 1) aK) - Qe). 2.022) 





provided Q satisfies the conditions which we have already laid down for 
Be 

Here, as throughout the paper, where a contrary assumption is not 
explicitly made, we shall assume that the integrated spectrum A(w) is 
absolutely continuous, and shall write 


@(w) = V2rA'(w). (2.023) 
We may, in fact, confine ourselves for all practical purposes to the 


case in which A(w) is absolutely continuous. The simplest exception to 
the absolute continuity of A(w) is the case in which A(w) has one or more 
N 


jumps, In this case, f(?) contains a part of the form X A,e“**. For such 
1 


@ function, the phase relations between the different components at 
present or in the past determine with perfect strictness the phase rela-~ 
tions into the indefinite future. The times at which sin (t) goes through 
zero are determined for all eternity by any one of them. This is not the 
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behavior which we expect of a telephone message or of an economic time 
series. It is only such terms as depend on an external rigidly periodic 
influence, like the day or the year, which show even an approximation to 
such a behavior. A periodicity in which the phase relations gradually 
alter through the course of the ages is not in fact a true periodicity and 
does not correspond to a sharp jump of A(w), but to a very rapid rise. 
It is a fact that the periods of the communications engineer are always 
more-or-less periods, never precise periods, and therefore have absolutely 
continuous spectra. 

If the jump spectrum is an idealization, never perfectly realized in 
practice, this is even more the case with the continuous spectrum 
which is not absolutely continuous. In both cases, to establish the exist- 
ence of such a spectrum presupposes an infinitely long run of observa- 
tions. In both cases, according to the work of Kolmogoroff,* the past of a 


* A, N. Kolmogoroff, Interpolation und Extrapolation, Bulletin de Vacadémie des 
aciences de U.S.S.R., Ser. Math. 5, pp. 3-14, 1941. Kolmogoroff’s work is of earlier 
origin than ours but is devoted to a slightly different question and consequently uses 
results which are more general but less specific. In both cases the object of study is 
the optimum prediction. In our case we actually obtain a function which we desig- 
nate as the coefficient function K(¢) to be used directly in making such an optimum 
prediction. Incidentally, we develop an expression for the mean square error of the 
optimum prediction. Furthermore, we discuss the continuous a3 well as the discrete 
case in prediction. Kolmogoroff discusses only the discrete case of prediction and the 
mean square error in this case. In contrast to our approach, Kolmogoroff concerns 
himself with the more general case where A(w) isnot necessarily absolutely continuous. 
On the other hand, we discuss only absolutely continuous values of A(w). Now, if 
A(w) is not absolutely continuous, there is in general no unique optimum method of 
prediction based on non-singular integral or differential operators. The operators 
which furnish such an optimum prediction do not assume as convenient a form on 
Kolmogoroff’s more general basis as on ours, but often appear as means or averages; 
so that it is only natural that Kolmogoroff proceeds only so far as the greatest lower 
bound of the mean square error of prediction, while we actually obtain the optimum 
predicting operators. 

The author wishes to comment on the historical relation between the present work 
and that of Kolmogoroff. The present investigation was initiated in the early winter 
of 1940 as an attempt to solve an engineering problem. At that time and until the 
last week of 1941, by which time the paper was substantially complete, the author 
was not aware of the results of Kolmogoroff’s work and scarcely aware of its existence; 
although Professor Feller of Brown University had mentioned Kolmogoroff’s re- 
searches in casual conversation at an earlier period. Mr. I. E. Segal of the Princeton 
Graduate School brought Kolmogoroff’s work to the author’s attention at the 
Christmas meeting of the American Mathematical Society for 1941. Thus it would 
appear that the work of Kolmogoroff and that of the present writer represent two 
entirely separate attacks on the problem of time series, and that the parallelism 
between them may be attributed to the simple fact that the theory of the stochastic 
process had advanced to the point where the study of the prediction problem was the 
next thing on the agenda. 
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function with a spectrum crowded into a set of frequencies of zero 
measure determines its future for an infinite time. However, there are 
many well-known cases in physics and enginecring where the discrete 
spectrum has been a useful approximation to the observed situation, 
while no case has yet been found in any such science where the non- 
absolutely continuous spectrum has proved a valuable tool. We say this 
without prejudice to the future. 
Thus, on our assumptions, 


f d[K(r) — Q{r)} f Ekta) —Geilvee — oh 
= = + aK (r) a Q(r)) i a[K(c) — Q(c)} yi B(w)eher-9 du 


if K and Q are of limited total variation. Thus, if @(w) has no interval 
over which it vanishes, we can have 


2 


cP * ivr dK (2) — Q(2)] (2.026) 








f 6 - TIL ake - ew - 6) =0 


when and only when K and Q are equivalent. In this case and in this 
case only, . 


dt 





T © 
tim J freee — f ft — 1) aK (r) 


TO 


=90)- f° o@+d0G), 2.027) 


and as a consequence, 


y(0) > f E e(a + 7) Q(z). (2.028) 


2.03 The Factorization Problem 
We now wish to solve equation (2.0207), which we know to have not 
more than one admissible solution. As at the end of the last chapter, we 


factor ®(u) into the product | ¥(u) ia and, employing the same notation, 
over every finite range 


ae 1 fe _log |e) | 
V(u) = ss exp (; { éo| ; (2.031) 


rida —i{y—u)—e 


and in the lower half-plane, if we assume (1.795) to be false, which makes 
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as}. (2.032) 


* log] eo) | 


this factoring possible, 
theo Creal 


Vu+w) = ep {= f 


The function Y(u -+ zv) will be free from singularities and zeros in the 


W(u + wv)e” 


lower half-plane, while over any finite range of u, 
is bounded for » >. As we have already seen, if we can arrange ®(u) 


as the quotient of the products 


% 
Tw — wz) (wo ~ By) 
&(u) = A? (Areal; m>n), (2.033) 
Tw — «,’)(w — @;') 
1 
where the imaginary parts of each w; and w;’ are positive, then 
TI (wo — ow) 
¥o) = As , (2.034) 
. Tl — w/) 
1 
and ¥(w) will belong to L?-on the real axis, as it has a denominator at 
least one degree higher than the numerator. Of course, in the general 
case, since 
L wo 
J $(w) dw = ¢(0) <@, (2.035) 
TI «2 
¥(w) will belong to L?. The function 
B : 
v() = him. =f w(se des (2.036) 
Boo 2n J-B 
will accordingly exist; and since 
V(u + iv) 
is of less than exponential growth as v— — «, we must have 
yi) =0, t<0 
except at a set of points of zero measure. Certainly, ¥(t) belongs to L?, 
Thus we may conclude that for almost all values of 8 
(2.037) 


F,(t) = ies Li Gee D 
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will exist, and we shall have 


Jim oT aS. Fa(t + c)Fa(t) dt = Ft dgF s(t + )Fa(t) 


= [i varov@a (2.038) 


i 


= = [| eo) Pe da = o(0). 


Now, 
Pp(t+a) = f vit a— x) de(s,8) 
+a 
+ "vet a=) dels, A) = Pal) + Rol, 


where Ps involves only the past or present of x(t, 8), and Rg only its 
future. On the other hand, 


£ ” a(t — 1) dK(s) = i ” dK(e) iC _ vt — « — 1) dz(r, 8) 
= ff ate8) [ ve- 6 — aK), 


which last involves only the present and past of x(t, 8). Thus, because 
of the independence of different ranges of z(t, 8), then for almost all 8, 


. 1 T a ae 
Jim 55 ie alt { Fs@—7)dK() =0. (2.0381) 
If then we can solve the equation 
Sl ve- ak) = vite) E20, — (2.0882) 
the K we obtain will minimize 
7 ° 2 
Jim an S| Foe ah = £ Fs(t— +) dK(r)| dt; (2.0383) 


and, since this minimization is unique, and F(t) and f(¢) share the same 
g(t), we have solved 


pie Bah f " olr—0)dK(e) (r>0). (2.089) 
To solve equation (2.0382), let us put 
r ” it dK(t) = k(w). (2.0891) 
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If we make a formal Fourier transformation of both sides, this yields 

fo Vat edt = We)bo), (2.0392) 
where @ is a finite constant; with the formal solution 


Warces ac at| 
Lvo 
k(w) = Te) ‘ (2.0393) 
Clearly the function k(w) will have no singularities in the half-plane 
below the real axis, nor can it grow exponentially there, so that one 
requirement for k(w) is fulfilled. It remains to be ascertained whether 
K(é) is of finite total variation. This will be the case if the numerator 
and the denominator of (2.0393) are rational functions, and if also the 
numerator is not of higher degree than the denominator. 
Let us now investigate this numerator. If 


Hey = Bs (2.0394) 
mn (@ ee @,)” 
then 
1 - amin iwt 
y(t) = ance ie = oe dw 
7 pm gm—1 pint 
= Dann «Cift > 0; and =Oift<0. (2.0395) 
: (m—-1)1 
This may be proved by the use of Cauchy’s theorem or by observing 
that 
20 pr tipeiontg iat 1 
f aes tomer. = bow . (2.0396) 


and that hence, by the Planchere! theorem, 


1 B eet qt lie tont 
slim. ff —— dy = + —*— itt > 0 
2m pre J—B (wo — wa)? 1 e 


= 0ift <0 (2.0397) 
Then 
% « m m—1 
f bla + te! dé = f et dt F ann ct cA es give(ete) 
9 9 man (am aad 1)! 
givnaym m—1 (m = 1)! ate ae 2 
= 8 a 1—k f tt t(uin—w) t di 
2% O.5 Ty = yl ay Bim 21 — ae 


m—-1 (ia) (m—1—k) 


He @—-1—hie—-ao (2.0397) 


= » Om newene 
mn : 


64 LINEAR PREDICTOR FOR SINGLE TIME SERIES 


Thus 

_ mot (ia) "-1-) 

> Oreo a 

hla) = 2 k=0 (Mm — k) Ww — wn)" | (2.0399) 


mn (w ios wn)” 


That is, if (w) is a rational function, with denominator of higher degree 
than the numerator, and has no real poles nor:zeros, we have a method 
of determining the formal expression for k(w) in which we do not depart 
from the w-axis at all. The final k(w) which we obtain may be trans- 
formed back into a K(t), as will be the general procedure in statistical 
theory and practice; or we may use it directly for specifying the char- 
acteristic of an electrical network or mechanical structure to accomplish 
our work of prediction. Let it be noted that as the poles of k(w) are all 
above the real axis, there will always exist a passive four-terminal net- 
work with a multiple of k(w) as its voltage transfer ratio, and that the 
construction of such networks belongs to a branch of communication 
engineering having a well-established technique. 


2.04 The Predictor Formula 


It is possible to express all the processes leading up to (2.0393) within 
the compass of a single formula: 


eee: sree ry: Y iu(tte) 
k(w) = ae) i Yaa f _ Yue du. (2.041) 


As will be seen, we first work with Y(w), taking its Fourier transform, 
advance it an amount a, discard the part corresponding to negative 
values of t, transform back into a function of w, and divide by V(w). 
The mean square error of the prediction determined by k({w) will be 
lim’ <= vat 


din se Let LH 1 LS Loe 


x i ” vue) dy — 3 * prior dg f ” w(u)eteiete? au) 


er = 6%" dw ive 
= Jim og nat ie 715. = Moe ap Re ue 


x £ V(uje™ ee 
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= [an Ss volte = m1) dre db ae dor [Hone doy 


pee eats ar ses i | ¥(c) |? do. (2.042) 


We shall denote this by the symbol F(a). 


= dw 








2.1 Examples of Prediction 

Let us now turn back to the technical detail of prediction. A few 
examples will illustrate both the power of this method and some of the 
limitations and precautions which must be observed in applying it. Let 
our first case be 








1 
iw) = ie (2.10) 
bee V(o) : ¥(o) : (2.11) 
=——: ao) = e 
ere ® mae || Te 
Accordingly, 
a 1 aE 1 

kw) = ——-+—— =¢%*; Ele) == (1 — e), (2.12) 

wt. wee 2 


That is, the optimum prediction of f(f + a) is obtained by taking the 
product of f(t) by a factor e® tending to zero asa —o. At first sight 
this seems surprising, for it looks as if this meant that f(¢) must tend to 
zero in some way or other. As a matter of fact, it says only that the 
predictable part of f(t + a) tends to zero, and that roughly the unpre- 
dictable part is as likely to be positive as negative. In other words, the 
values of f(t -+ a) will have a distribution centering about ef(t). This 
is altogether reasonable. Again, let us consider the example 








ie ange as Se ee eee ee PEED 
}(w) tae (« 2 Vo s \u + aac = 
Here _ 
Vw) = 22S See _v2/2__ = _ ee. 
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Accordingly 








A) (Sin V2/2 i) ee) a= V2/2 ) 
mes cary A irr 


V2/2 ( V2/2 

1+¢}~ 1-¢ 

("ts alert 
AOA OM) (EG 
Goa 


= twV2 e-«/v2 sin —— + e-alv2 (cos + sin (2.15) 
V2 


me 


What happens here is extremely interesting and suggestive. The k(w) 
which we obtain is not the Fourier transform of any function of limited 
total variation, and the argument which we have given, interpreted 
strictly, breaks down. On the other hand, the operation of multiplication 
by k(w) on the frequency scale corresponds to the operation 


(v3 Q eal? gin “+ gave cos Fe + sin (2.16) 


5 5) 


on the time scale. This operator may well apply to all the functions f(é) 
of the ensemble which we are predicting, and formally, at any rate, it 
represents a perfectly usable prediction of the future. There is no opti- 
mum prediction by means of functions K (é) of limited total variation, 
although by such functions we may reduce the norm of the error of 
prediction as nearly as we wish to the greatest lower bound of its value. 
To attain the greatest lower bound, we must abandon the class of 
operators determined by K(t) above and enter the class of differential 
operators. Without going into the argument by which it may be seen 
that this is really the rigorous situation, we may state that this is the 
fact, and that the theory of prediction is complete in this case as well as 
in the previous one. The essential point is that Y(w), not k(w), belongs 
to L?, 

Let 


= roe (2.17) 
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so that 
V(w) = Coa (2.171) 
Here 
ia 1 
c =O fee sat 
k(o) = ei fins — 4) +1] (2.172) 
(w — 7)? 


= a “iw +e *(1+ a); 
E(a) = 3{1 — (1 + 2a + 2a*)e**}. 
We thus again meet the situation in which the prediction operator is a 
differential operator. 


























Let 
vi w* 
2) = (2.18) 
Here wort 
V2 V2 
i+ = v2) 1=—Hi— v3) 
2 2 
= — > is , (2.181) 
2 v2 
i-1 -i-1 
14+4(1— V2) Ga) 1 = #1 V9) Fg Se 
2 ——- 2 PN 
Bie) ss —— - 








ah aa! 


es ane AB 0+ (SZ) 


o-—7t 
a | = t 
ods 1 — i(1 — v2) (2) - eee 
2 o—t 
ws Aw + Bi : (2.182) 


o-?t 
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where 


A= eel? [cos (+) + (V2 —1)sin (53) ; 


B= ew? [v2 — 1)sin (=) — cos (=)| ; 


E(a) = “Bie A? — $7} -B_vi- na + PeosaV3): 
(2.19) 


Here again the prediction operator does not involve differentiation. 
This will in general be the case when and only when the denominator of 
(w) is two degrees higher than the numerator, although there may be 
particular values of a which give a prediction free from differentiation in 
other cases. 


2.2 A Limiting Example of Prediction 


The basis of our method of prediction is the separation of (w) into 
two factors, one of which is bounded and free from zeros in every half- 
plane above a horizontal line above the real axis, while the other is 
similarly bounded and free from zeros in every half-plane below a 
horizontal line below the real axis. There are of course functions (w) for 
which this factorization breaks down. It is very interesting to observe 
how the optimum prediction behaves in the neighborhood of a function 
@(w) for which it breaks down,.and for which the factorization into 
¥(w) - ¥(w) is impossible. Such a &(w) is furnished by e~*”. An approxi- 
mate ¢~*", for which prediction is possible, is given by 


1 
Ta (2.20) 
GF 
n 
The corresponding Y(w) is 
(1 + =) , (2.21) 
which gives as a V(t), 
po etve 
(2.215) 


nl? (2 — 1)1 


The mean square error of prediction for lead @ is 


OL preg tin 
f i [(n — 1) IP di. (2.22) 
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By differentiation, we see that the integrand is a maximum for a value of 
t given by the solution of 














(2n — 2)i27-3¢-2t¥® _ on/py2n—2e-teva (2.225) 
This will be 
iw n-1l1 
Vn 
Let us put 
t + n-l1 
= 7 . 
Vn 
Then the integrand becomes 
nm — 1\2n-2 f3 a ee 
pon-2-2tvin ¥ (> 7 Wa ) greg ties) 
a™[(n — 1)! n"((n — 1) $7 
x (n pee ai eee ( = Vn < 
[(n — 1)!] n—1 


= (n ed 1)?" ne?" [2 rn Qn—2 
{m= 1)i? (a 7 


( + aterm of order of vet) 
x : ; rnd | (2.23) 
not exceeding l= Ty 
Asymptotically 
(n pS 1)?"*-?ne?-2* (n ae Lee 28 S. (@ 24) 
[(n — 1)!? (n — 1)?8 767-272 (n — 1) Oe 
Thus 
ot —2 pubst a 
f sg ( va” Gt (2.25) 
0 2 


will be a good representation of the error in prediction for large n. If 
a is substantially Jess than Vz, prediction will be good, and if it is 
substantially greater than Vn, prediction will be bad. Approximately, 
Vn is the period for which good prediction is possible. 


Now 
n—n wt of | tat 
jee) ag oe? 
Tt 
ney 


sg ee eat) (2.26) 
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To-a first approximation, 


w? —n ne w* 
( as “) =¢ (1 + =). (2.27) 


Here the error term in comparison with e~“ is 


@ 2 


—_— La ew 
on é (2.28) 
which has its maximum when me 
wo = V2, 
Approximately, the absolute maximum error is 
4e-2 Qe? Qe7? 
eat er ———— eee (2.29) 
2n n (time of good prediction) 
or 
time of good _ V2671 (2,295) 


prediction /allowable maximum error of B(w) 


Thus, if &(w) is allowed a tolerance of 10 per cent of (w) maximum, the 
time of good prediction is about 1.6 seconds; if it is 1 per cent, the time 
is about 5 seconds. These should be compared with the root mean square 
of ¢ taken with respect to g(t), which in this instance is V2. It will be 
seen that (w) must be known quite accurately to make any sort of 
long-time prediction possible, and that this accuracy requirement in- 
creases as the square of the desired lead. 

This point is made here because it arises in certain schemes of predic- 
tion proposed for use on the motion of airplanes, although it is not easy 
to handle them adequately in this direct manner. It has been proposed 
to predict the polar coordinates of the motion of an airplane, each on its 
own merits, by means of independent linear predictors. For a straight- 
line path, this would involve the prediction of an anti-tangent curve. 
The Fourier transform of such a curve is far richer in higher frequencies 
than the curve e~ and yields a correspondingly more unstable predic- 
tion. If we actually use the known velocities of planes and shells, we shall 
find that any prediction of the course of a plane which will be good 
enough to allow hitting the plane in the center part of its course, where 
its angular velocity is greatest, but where, on the other hand, the plane 
is nearest to the gun, will be so unstable elsewhere as to demand an 
accuracy in the rectilinearity of the path of the plane and in its tracking 
which is quite beyond reason. Important as is.the method of prediction 
given in this paper, it has strict limitations in practicet and should never 


t This is, in fact, true of any method of prediction. 
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be used to determine a curve which may be determined in a strictly 
geometrical manner. Statistical prediction is essentially a method of 
refining a prediction which would be perfect by itself in an idealized 
case but which is corrupted by statistical errors, either in the observed 
quantity itself or in the observation. Geometrical facts must be pre- 
dicted geometrically and analytical facts analytically, leaving only 
statistical facts to be predicted statistically. 


2.3 The Prediction of Functions Whose Derivatives Possess 
Auto-correlation Coefficients 


Up to the present, we have been predicting functions which them- 
selves possess an auto-correlation coefficient. Let us now investigate the 
prediction problem for functions whose first derivative possesses an 
auto-correlation coefficient. Let f(t) be a function with a derivative 
f’ (), with an auto-correlation coefficient gg, and let 


Bale) =f alder at; gal) = — J aloe dw. (2.30) 


Just as we have factored $(w) into Y(w) - WB), let us factor 


$a(w) = Ya(w) - ¥a(@), (2.305) 


where Wa(w) is free from singularities and zeros in the lower half-plane. 


Then the predicted value of f’(¢ + a) will be the result of applying to 
f(t) the operator 


tee = : —iwt f. fu(tta) 
we, ewer dt J | Valuje du. (2.31) 


It is natural to regard the best value of f(é + 8) — f(t) as the result of 
integrating the best value of f’(¢ + a) from 0 to 8; and indeed, if we are 
looking for an optimum prediction in the least square sense, an applica- 
tion of our method of minimization will give exactly this result. That is, 
the best estimate of f(i -+ 8) will be obtained by adding f(¢) to the result 
of applying the operator 


1 ee 4 gix(tts) _ giut 
ee Oe | f tatu) ey, «6S 
QxVa(o) b A . an iu ME 


to f’(é), or the operator 


. ° e tu (t+) tut 
tw 4 aL e# —e 2.32 
FnValw) () f e tt dt » Valu) ee du (2.32) 
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to f(t). This yields the predicting operator 
em (+8) et 


a one ee me = i. e-iet dy es Yalu) du, (2.325) 


If Ya(u)} behaves like k(u) at O (since it has no singularities in the lower 
half-plane), this becomes 


a @ 8 FO ee E x Va(u) iu(t+s) 
oe aH ciety fe att ay (2.33) 


so that, if we put 


$(u) = ue + $(u) = meh (2.335) 


the form we have already established in (2.325) for the prediction of 
operator k(w) remains valid. 

Let us generalize this. Let f™ (¢) have the auto-correlation coefficient 
galt), yielding ,(w) and ¥,(w), as f(t) has yielded (~) and ¥(w) in 
the original case. Let 





O(w) = oO, (w); ¥(w) = w Vn (w). (2.34) 
Then the predicting operator for lead 8 will be 
Ps : (twp) 1 ~ saat 
(eo) = 1b ie to ae eist at 
: or ie : (iug)"—* 
jut tub aN EN a 
x [vane [« 1) — Gus) aan | du, (2.345) 


which will be a form in which we may write the predicting operator for 
the case where f(t) itself possesses an auto-correlation ‘coefficient. To 
establish this, we follow exactly the same lines as before, noting that 
Taylor’s theorem with remainder asserts that 


ft+s)=fO+e'O+--- > te (t) 


8 n-t 
+ fi der [dre + [deaf + an); 2.35) 
and that 


B Tn) 
f dz, { dzz eis j dz, ett7 
0 (i 0 


= a [ew -—i- tup ps OGL al (2,355) 
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There are many cases in which these formulas are useful in practice. 
For example, we may wish to predict the time of flight of a shell from an 
anti-aircraft gun to an airplane moving in a straight path. This is cer- 
tainly not a function with a finite mean, but for a considerable portion 
of the path it is a function with a second derivative not changing size 
very rapidly, and is not altogether unsuitable as a basis of a prediction. 

The mean square error of the prediction established by our last k(w) 
will be 


5 1 Tr = dr 
tim sp Le | [10-9 § 
cs iwt - Dawe (ip) 1 vs we 
x foe do jl + tap + 7 at ae € ao| 


7° ; n—I 
x q. W(uje™? [ ~i)-eee eo 














= Cpe) vie ey | du - 
ao & "eto de f wwe (et — pee | " 2 
= tim 5, - 2 ft~No af. oe of ae 
= tin ap | 90 EL see Lore 


2 





. ees ius ars § aps 
xf Yalu) Tp E : ae dhe 


as an n-fold integration by parts shows. Let us simplify this expression 
by putting 


(2.36) 


—_ ie tut 
vn(t) = 7 a Y,(uje™ dt (2.365) 
and 


B 4 Trl 
¥(8, i) a 1 doy f dag oie S f Vat + on) don. (2.37) 
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Then the mean square error of prediction becomes 
tim of "7 G- nar f- mao Se. 46,0) do| a 
-{[ an f~ ¢n(72 — 11) Era Uy 
oe 
x f e*#141 (8, 01) doy - oc¥ Seep J. 
=~ fd ft oe yiye)def 


-f- | ¥B, ¢) |? de. (2.375) 


Since ¥,,(¢) vanishes for negative arguments, let us notice that, if ¢ < 0, 


¥66,0) = fr dor [don f yalon) den 


a-ite 
= Lew fl tee Ynlon) don 


“Lf aie cr dona J” valor) dee 


eee eee 


= Oe ay re Vn (on) don. (2.38) 





e848, 09) dog 





Thus the mean square error of prediction becomes 


a2 [done vale) don do 


2.4 Spectrum Lines and Non-absolutely Continuous Spectra 





(2.39) 


In all our prediction up to the present, we have confined our attention 
to the practical case of functions f(t) generating absolutely continuous 
spectral functions A(w). In the more general case of discrete or non- 
absolutely continuous spectra this theory of prediction has not yet been 
as completely closed as in the specific case where A(w) is absolutely 
continuous, although the ground work for this has been laid by André 
Kolmogoroff (loc. cit.), The main point in the generalization is that, if 
@(w) is defined in terms of A’ (w) wherever the latter exists, two cases must 
be considered, the first where 


* | log | &(w) | | 
v4 de (2.40) 
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diverges, and ihe second where it converges. The first case is singular, 
and methods may be found to predict f(t) to any desired degree of 
accuracy on the basis of its past alone. If (2.40) converges, and Y(w) is 
defined as in (1.775), and 


y(t) = = é i V(w)e** du, (2.401) 


then the greatest lower bound of the mean square error of prediction for 
lead @ is 


f "(gitar (2.402) 


Let us now proceed from the norm of the error of the optimum predic- 
tion to the technique of obtaining such an optimum prediction. In the 
case of discrete spectra, the technique to be followed is perhaps best 
illustrated by the following example. Let y(t) = [e™! + ¢(é)], where 

1 s ; 
ai) = = Jf eee de. (2.404) 
Let 
j fatr 
g(t) = f(t) - 7 J, f(t — re dr. (2.41) 
Then 


fe im of, g(t + )g(¢) do 


1 fAtt rae 1 ArT ; 
= (t)- ad, o(t + re dr — =f g(t — ret dr 


1 pAatr ALT 
+ nl. ede f (t+ r— ae dr 


= g(t) + (an expression vanishing as 7’ 0), (2.42) 


While it is true that the expression described verbally does not vanish 
uniformly, the following procedure will give a prediction arbitrarily 
near the optimum: we predict g(é) (for a large T) as though ¢;(t) were 
its auto-correlation coefficient, and then add the “cancelling” term 


1 att 
F f fl — re" at, 


If y(t) contains several spectrum lines, we may remove them one by one 
in this fashion. This procedure may be used to give approximate results 
even in case of an infinity of spectrum lines. 
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2.5 Prediction by the Linear Combination of Given Operators 

Let us now discuss the problem of prediction by means of a linear 
combination of given operators. We have seen that the mean square 
error of prediction for the operator with the time characteristic K (t) and 
the frequency characteristic k(w) will be 


ita) — f° fe) ako) 


l : ° ; d 
im — t 
baad 2T Jt 


= (0) -ar| f~ ole + a) dK) | + 
3 Ete) I aK (z)o(r —¢) 


= x J 2@) |e — kw) Pao. (2.50) 








Thus, if 
N 
k(w) = > ankn(w), (2.51) 


the problem of prediction consists in approximating in the mean to 
n 

VO(w)e* by asum ¥ ankn(w)Vb(w). In the case in which ¥(w) exists, 
1 


it may be convenient to replace Vb(w) by Y(w), which will make no 
difference in the result. The important restriction on this problem is 
that the admissible functions k, (w) are transforms of functions vanishing 
for negative arguments, and themselves have no singularities or large 
infinities of growth in the lower half-plane. If @(w) is factorable and 
¥(w) exists, this will also be true of the functions k,(w)¥(w), while, in 
general, it will not be true of e**W(w), so that in this case a perfect 
solution of the prediction problem is impossible. If, on the other hand, 
@(w) is not factorable, the set of functions k(w) V8(o), where k(w) has 
the desired behavior in the lower half-plane and k(w)V@(w) belongs to 
L?, is closed. Otherwise there will exist a non-null function I(w) of L? 
such that always 


ci ” (a) VE) (w) do = 0, (2.515) 
and in particular 
* U(w)VO(w) ay 


-2 @O-— Wy 


=0, (2.52) 


for all w; above the axis of reals. It will follow that 1(w)V ®(w) is a possible 
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set of boundary values for a function analytic above the axis of reals, 
and that hence 


i | log | 2(c0) V&(eo) | | 


ae da < @, (2.525) 


which gives rise to 


* [Hog | iw) Ve) || , 
-f ee (2.53) 
Since I(w) belongs to L?, 
f.* i og le) | | gist ee (2.535) 
Because 
* 1 log_ | &(w) | © log | Z(w) | 
2 ie fe [ag ee eee 
we arrive at | 
° log_ | B(w) 
— oes ag ee an! <<, (2.545) 


As, however, &(w) is absolutely integrable, 


* logs |) | 
f.~8 a a; (2.55) 
and on combining these, we get 
* {log | (a) | | 
f lea da < 0, (2.555) 


which is known to contradict our assumption that 6(w) is not factorable. 
In this case, a perfect prediction is possible, and we may obtain as 
good a prediction as we wish by orthogonalizing a closed set such as 


V@(w) 
Be (2.56) 
(1 + tw)” 
and expressing V®(w)e~** in terms of these orthogonalized functions. 
In the factorable case, we orthogonalize a set of functions ¥(w) + kaw). 
For example, if 


&(w) = , (2.565) 


es 
1 + w? 
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a possible set of functions ¥(w)k,(w) is the orthogonal set 
(1 — tw)” 
————— (2.57 
Va(l + iw)"tt ) 


Here we solve the prediction problem by expanding V(w)¢~** in terms 
of this set; and since 


2 et? (1 + tw)” jee n=1; 
pa SE! ME 
eo 1 to (1 — dw)"*t 0 if n>; 


we obtain as our prediction operator e, as before. 





(2.575) 


2.6 The Linear Predictor for a Discrete Time Series 


The problem of prediction is of interest in the two closely related fields 
of communication engineering and of time series in statistics. In the latter 
field, while continuous time series do occur and are important, the nu- 
merical] data will generally be placed in the form of discrete time series, 
In such a case the function f(t) of the continuous parameter ¢ is replaced 
by the function f, of the parameter », which varies by discrete steps. 
Similarly the function g(r) will be replaced by the discrete set of auto- 
correlation coefficieats 


4 1 =! 
oS eh Lye pe a 


The analogue of our previous function &(w) will be the periodic function 
oo) = Lge” (2.605) 


of period 27. As may readily be proved, this periodic function will always 
be non-negative. 

The problem of factoring $(w) occurs in the same way in the discrete 
case as in the continuous case. One way of accomplishing this is to first 
form log (w), which will have the Fourier series 

ae. (2.610) 


Under these conditions, let us put 


-1 - 
L@) = — I ae’ + Ya (2.615) 
_—o 1 


and 


V(w) = VO(w) exp [; L(e) . (2.620) 
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Then we have 
Vw) = z bye’, (2.625) 


It will be clear that the function ¥(w) will have no singularities or zeros 
below the axis of reals and that its logarithm in that half-plane will be as 
small as possible at infinity. Let it be observed that the computation of 
¥(w) can be carried through with the aid of the harmonic analyzer and 
elementary arithmetical tools. 

Now the formula for k(w) which we have already obtained in the con- 
tinuous case has, as an analogue, the formula 











—ivw tur 
Bia) = aye! fl vwe du (2.630) 
in the discrete case. If we now write 
eee 4 eR (0s) des, (2.635) 
2x -" 
then by the use of these K,’s we minimize the expression 
1 
at Pe E eee) 
The minimum of this expression will be 
a-l 
= |e (* (2.645) 
0 


This checks precisely with Kolmogorofi’s results for the non-absolutely 
continuous case, although, as said before, he extends his result to include 
more general spectra. 

This is a completely satisfactory method for obtaining an optimum 
prediction by means of the ¢, of a discrete set of data. ‘There is, however, 
another method of factoring S(w) into the product ¥(w) - ¥(G) which 
may be applied on occasion. Let us consider the case where @(w) is the 
polynomial 


Lees (2.650) 
Let us solve the algebraic 5 ahah 
= Pre “"* = 0. (2.655) 


Then the roots 
=A-+Bi (A?+B?>1) (2.660) 
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will correspond to values of w in the upper half-plane. Similarly the roots 
e* =A—Bi (A? +B? >1) (2.665) 


will correspond to values of w in the lower half-plane. Let.us next form 
the partial product 


Tlfe~** — (A + Bi)\fe — (A — Bid). (2.670) 


It will be seen without difficulty that this must be Bw) itself, except for a 
constant factor, and that the partial product 


I[e~** — (A + Bi)], for all roots for which A? + B? >1 (2.675) 


must be Y(w), except for a constant factor. In this case we have replaced 
the combination of logarithmic transformation and.Fourier analysis by 
the solution of an algebraic equation which may be of a high order. The 
choice of methods depends on the degree of the equation and on the 
instruments or other.mathematical computation facilities available. In 
either case the computation may eventually be carried out and always 
gives us valid and reasonable method for the handling of. economic, 
meteorological, geophysical, and other statistics. 


CuarTer III 


THE LINEAR FILTER FOR A SINGLE TIME SERIES 


3.0 Formulation of the General Filter Problem 


Let f(é) and g(t) be two complex time series. Let f(#) represent a mes~ 
sage and g(t) a disturbance. We wish to determine that linear operator 
which, when applied to f(t) + g(é), will give us the best approximation 
fo f(¢ + @). In order to do this, we need statistical information con- 
cerning f(é) and g(¢), which will be given by their auto-correlation and 
cross-correlation coefficients. We shall suppose that sufficient auxiliary 
conditions are satisfied to permit the free shifting of the time origin of 
the series as justified by the lemmas of the last chapter. Of course, the 
results of this chapter will include those of the last in the particular case 
in which g(t) vanishes. In order to avoid as far as possible the duplication 
of arguments already given, we shall use a slightly different approach to 
the solution of the integral equation both of this chapter and of the last, 
although the methods of both chapters are applicable to either case. 

Formally, the problem of this chapter exhibits certain differences 
according as the sign of ais positive or negative. If it is positive, we again 
have a prediction problem before us, albeit a prediction problem from 
perturbed data. If we start from correlation coefficients with rational 
Fourier transforms, the prediction operator on the frequency scale will 
likewise be rational. On the other hand, if a is negative, we arrive in the 
first instance at an operator which will be realizable, in that it will be a 
function of a complex variable with the desired behavior in one half- 
plane, but which will not be rational and will require another stage of 
approximation before it can be realized in an electric or mechanical 
network of a finite number of meshes. Offsetting this disadvantage, it 
will perform its filtering function as between the message f(t) and the 
noise or disturbance g(t) more perfectly and with increasing perfection 
as a—» — oo. In general the spectra of f(t) and g(é) will overlap, and 
when this is the case, not even an infinite lag in filtering will give perfect 
discrimination. The degree by which this discrimination fails to be 
perfect will determine the lag which may advantageously be designed 
into the filter and indirectly establish the number of network meshes 
justified. 

The significance of lag in a filter may be seen from the following con- 
sideration: we shall find that, even though we approach the problem 
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from the standpoint of time characteristic, we eventually arrive at a 
filter which is essentially an instrument for the separation of different 
frequency ranges. However, to determine the spectrum of an impulse 
with perfect sharpness, we must know the history of that impulse over 
all time. This we can never do, but the longer we wait before operating 
with the impulse, the more of that impulse has had an opportunity to 
come through and the more completely is its spectrum determined. 

While our treatment of the filter, like all treatments; ultimately in- 
volves the spectra of the message and the noise or perturbations which 
it is supposed to separate, our criterion of performance is the mean square 
distortion of the message over the time and involves transient behavior 
as well as steady-state behavior. It is quite as critical of distortion of 
phase as of distortion of amplitude. In this it differs from methods more 
familiar in communication engineering theory, which have been devised 
for the most part with reference to voice-transmission and ultimately 
with reference to the human ear—an organ of unusually fine frequency 
discrimination, fair amplitude discrimination, and very bad phase 
discrimination. However, in television work and in other related tech- 
niques, the final receiving instrument is highly sensitive to phase, and 
existing filter techniques have proved distinctly inadequate. This has 
also been experienced in the case of mechanical filters, designed to 
smooth out irregularities in the input of accurate servo-mechanisms. It 
is hoped that the methods of this chapter may prove useful in these 
fields. 


3.1 Minimization Problem for Filters 
Let us consider the expression 


li ey 
tin oo J 


Under very general conditions, this expression may be written 





° 2 
feta) — f[" Yd) +9¢- MaKe) | dt, 
(3.10) 
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(3.105) 
where 


= ji =f. f(i) dt 3.11 
¢11(7) jm oF | Abt HO ; (3.11) 
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gi2(r) = jim = 5 ; fit no@ dt, (3.111) 
¢22(7) = Jia oe vee g(t + r)g(® dt. (3.112) 
Let us put 
geu(r) + g1e(1) + gi2(—7) + 922(r) = o(r). (3.12) 
Then clearly 
e(r) = o(—7), (3.125) 


and the expression which we wish to minimize will be 
(0) — 28] [ teurle +2) + enla + M4KG)| 
re f dK(c) td dRG@elr—c). (8:18) 


If we write 


gii(e + 7) + viele + 7) = h(r), (8.135) 


this minimization is reduced to that of the general real expression 


~or| f ” ACs) aKG)| + f° ake) 7 " Gilelr— od. eeIe 


Let us suppose that 
i(r) = [ol — 0) aQ{e). (3.145) 
Then (3.14) becomes 


— f° 2a) [ol = 0) aK@) 
sek [QO [Ole - aK) 
+f aK(o) f° aK@elr~«) 
= =f aQte) [° ol = 0) 40H) 


+ f° ake) -O) [° ARG - Tlee- 0). 
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Let us now suppose that 


f ” eivt aK (t) = k(w); (3.15) 
JF e200 = oe); (3.16) 
el) = 5- f Bupel de. (3.165) 


Then the expression to be minimized becomes 
1 0 3 a ° a 5 
= J Lae}? 2) do + =f" Ho) - a) PH) de, B.17) 


which will attain its minimum when 
gq) = k(w), (3.175) 
which minimum will be 


— =f. (w) | g(e) [2 de. (3.18) 


3.2 The Factorization of the Spectrum 


In general, let us consider the integral equation 


KE) = f ei AaEG) G0); (3.20) 

where 
i= = f H(w)e¥! dey, (3.205) 
git f i (weit deo, (3.21) 





and ¢({t) = y(—t), so that (w) is real. We shall suppose that 
1, ” gist aK (t) = k(w). (3.215) 


For the present, let us confine our attention to the case in which @(w) is 
rational. Here we shall limit ourselves to the case in which $(w) has no 
real zeros. We may then write 


bw) = | ¥(w)|?, (3.22) 
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where ¥(w) is a rational function free from zeros and poles in the lower 
half-plane. If we put 


v(t) = ' f E V(w)e' dw =0 if t<0, (3.225) 


[vet nv a (> 0; 


e(r) = sae 


: i Vite dt (r<0). 
Then (3.20) becomes 
h(t) = f Ko ae $9 owas 

rs é dK(r) ie His eleleyae 

= [Woh ae [" va—1 +0) aK) > 0). 8.24) 
. L 7 ° a 
Now let us put 

h(t) = [FH U6 + Ha (—© <i<o), (3.245) 


We see that, if we extend ¢ and dK to cover negative arguments, for 
which dK is to vanish, 


i " v(t = 1) dK (+) = f : Z “ (3.25) 
If 
iG) f " Le" dt, (3.255) 
equation (3.245) becomes 
H(w) = Uw) - ¥@), (3.26) 
{where H(w) is the Fourier transform of h(é)] which leads to 
H(w) 
Iw) = FG) , (3.265) 
or 
p< [" Eo mam (3.27) 
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Then (3.25) becomes 


¥(w)k(w) = a. L()e-** at, (3.275) 
which again leads to 


H 
kw) = 5 ; Son sol eiet ay 5 is ; a et du, (3.28) 


The denominator of this expression is a rational function free from zeros 
in the lower half-plane, while the numerator is free from singularities in 
the lower half-plane, and if L(é) satisfies the condition 


+ "1 L@|a< @, (3.285) 


it will be bounded in that half-plane. Thus k(w) is of at most rational 
growth in the lower half-plane, in which it is free of singularities. If 


kw) = f- *aKyo, (3.29) 
then K,(t) is constant for negative arguments and is a possible K(t). 


3.3 Prediction and Filtering 
Let us consider two cases in particular. To begin with, let 


h(t) = o(¢ + a). (3.30) 
In this case, 
° H(u) eit ae an &(u) tute siut 
. bean du= = f- Pit ett! du 
= he +a) (> —a), 
0 (t < —a). (3.305) 
This will yield us 
vi ” y(t aden** at 
k(w) = To) (3.31) 
If $(w) is rational, this will be a rational expression. In fact, if 
¥(w) ” Z; o——— —— ’ (3.3815) 
mn (wo — Wn) 
we shali have 
40 = ~S. ef dau SA a (3.32) 
mn a (w ae Wn)” 2 
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Now, 
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and hence 
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As a consequence of this, 
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— (w — w,)™ 
Again, let H (w) be of the form M (w)e***, where M (w) is rational and has 
no real zergs or singularities and has a denominator of higher degree 
than the numerator. Then we may write 
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Let us call this function w(t). Then, if a is positive, 
+f 2 H(u) eit dy 
Qn ¥(u) 


= Lama Gayl 


(m—1)! 


If, on the other hand, «.is negative, 
~f == H(u) ert du 
2r « Vu) 
Loman Gayl i (E + a)y*tefonttte) (> —a);- 


- (3.37) 


>> fue E Ti v =i (t ae ay} given (tra) (0 <i< —a). 


‘hus the cases of lead and of lag must be treated differently. In the 
first case, 


(E+ a)™temtte) (f > 0). (8.365) 


9 (o a Y z Op nen aT 
k(w) = ee (3.875) 


which is rational if (w) is rational. In the second case, k{w) will not be 
rational; and, if the function k(w) is to be approximated as the voltage 
ratio of an electric circuit, a further investigation is needed. 


3.4 The Error of Performance of a Filter; Long-lag Filters 


In any case, the minimum of 


-2n{ [~ h(1) aK@} i$: Ye aK (e) 's dKG@)e(r —c) (3.40) 


2 
| ae 


ja (3.405) 


will be given by 


+ [lawl - 


1 m —iwt a H(u) tut 
sob € dt ae du 
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If we put 
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the minimum of 

















r y i 2 2 
diss aS. \ere - { (ft — 7) + 9(t — r)]dK(r)| at 
(3.415) 
will be given by 
ey ha {|b 6° BO on 
=> Fi feu(o - |. a du ‘|. (3.42) 
If g(t) vanishes, this will be 
1 
=> a lv [2 ae, (3.425) 
In general 
™ I eter 
7a E Amn one P at 
( —1 yi (m+p—2) 1 
“tie oe aie 


and this may be used to determine the minimum of 


im on f arc + a}— i’ [f — 1) +9 - naxeo[ at 
(3.435) 


whenever a > O and all the functions $;;(w) are rational. If, on the other 
hand, a —» — ©, we shall have for the limit of the minimum 











sin 3, £2 fon 
— #(@) mol cota fan "| ae 
~ $L [euw - ROAST | ae 
: , 
ae Lo - eo iets 


_1 £7 __#u@in) -|¢n@)P 4, 
2a J G11 (w) + Bin (w) + Si2(w) + $22 (w) 


In the particular case where f(é) and g(t) have zero cross-correlation 
under any finite lag, $,2(w) is identically zero, and the minimum value 


(3.44) 
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of (3.415) will tend to approach 


1 £* _211()S22(e) 
2Qr i. 11 (w) + $22(w) 


as the lag —a becomes infinite. Whether the cross-correlation is zero or 
noi, let it be noted that f(¢) and g(¢) occur symmetrically in these ex- 
pressions. 

We have now considered the performance of long-lag filters from the 
point of view of their “norm” or figure of merit. It is appropriate to 
consider the asymptotic form of their k(w). This will be 


eee e ivt M(u) 
Qa (w) Se jo Ww) 


(3.445) 


k() = : em! du; (3.45) 


and if we consider only the asymptotic value of k(w)e~***, this will be 


M(o) _ &1(0) + F12(v) | 
a Se (3.46) 


3.5 Filters and Ergodic Theory 


A fundamental theorem in the theory of the Brownian motion asserts 
that, if the responses of two linear resonators to Brownian inputs, 
whether the same, different, or partly the same and partly different, 
have a zero cross-correlation coefficient with respect to the parameter of 
distribution of the Brownian motion, then they are not merely linearly 
independent, but (as the parameters of the Brownian distributions vary) 
they have entirely independent distributions. Now the problem of 
optimum prediction is solved by reducing f(f-+ a) to a part linearly 
dependent on the past of f(#) and a part uncorrelated with the past of 
f(t). If the first part is 


ff @-neKe, (3.50) 

then the second will be 
S(t+a)— f f(t — 1) dK(r); (3.505) 
and by the ergodic theorem, the cross-correlation of the latter and 


j(t +) with respect to the Brownian parameters of distribution will 
almost always be the average in time, or 


o(t+a)— f° o(e-*) dK), (3.51) 
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which accordingly must vanish for positive s. This is the integral equa- 
tion of prediction. The integral equation of filtering, or 


en(sta)— f™ leue— 2) +en(e— lek) = 0 (@>0), 
(3.52) 


may be regarded similarly as the statement of the vanishing of a cross- 
correlation, and in the Brownian case will be found to assert that the 
error of performance of our optimum filter is wholly independent of 
known data and thus wholly unpredictable. Of course, ils distribution 
will be known, but this will be a Gaussian distribution about zero 
determined exclusively by the functions ¢;;(f) and will have nothing 
further to do with the function f(z). 


3.6 Computation of Specific Filter Characteristics 


It may be of interest to work out several filter characteristics both for 
lead and for lag. We shall assume that ¢i2(¢) is identically zero. As for 
¢o2(t), we shall take a case which, although not formally contained in 
the theory we have given, constitutes a limiting case of it, and one of the 
greatest importance in practice. This is the case in which the noise input 
is due to a shot effect and bas an equipartition of power in frequency. 
Theoretically, of course, this is not strictly realizable, as it would demand 
an infinite power; practically, as in the case of Planck’s law in optics, 
it may hold within the limits of observation up to frequencies of a 
magnitude so great that they are no longer of interest for our particular 
problem. Thus we shall put do.(w) = e”. As to ;(w), it will depend on 
the particular problem considered. For an example, let us consider the 
case ®y,(w) = 1/(1 + w*). Then 


tow 





H(w) = © a5 (3.60) 
S(w) = ra += iaeeee : (3.601) 
¥(w) = wae tae (3.602) 
oma | 
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1 fo eit) of 4 é 
~ Qe Joe e4+V1+2 Ce i Ta) 
ge (ita) 
(e+ V1+) 
- VIF &(t + a) 
e(e + Vi+ 2) 


If @ is positive, this gives us 


(> —a) 


(t < -2). (3.604) 


itive 14+ iw ‘ (ta) piwt a 
Vite + ew 0 e+ vi+e 
1+ iw toe = 
“WViteteds e+Vite lt 
Spree Le ye ee Some (3.605) 
e¢V1ite Vite 4 ciw 


3.7 Lagging Filters 


If a is negative, we must have recourse to approximations. For 
example, let us notice that 


iaw — (1+ (aciw/2r) * 
aca ae G = a) C0) 
Thus 
H(w) _ + ey 1 
Ve) “Al — (atw/2yv) (1 + iw) (W1 + & — ew) 


Ay a a oka s Cc : 
fi - (aiwj2nyt * l+io' Vipe—edy (8-705) 


where simultaneous equations for the A,, B, and C may be obtained, 
for example, by substituting different values for w in either the equation 
given or in its derivatives. Thus if vy is 1 and a ~ —2, we have 


(A, + B)V1I+E@+C=1; (3.71) 


AVTFe —) +B(-e-S vir) +0(1~$) we (3.715) 


a 
3? 
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yielding 
7a ee, See 
ee) yes 
(3.73) 


lis general, 
Hel Fae eo aer iee) 
snd in tbivepecial ease 
o /2 1 + t 
(1+$\vire ~ :) (VIF? + ea)(1 - =) 
1 — (a/2) 


—————— TE) 
( +3) + Vit+e)(vi+ e + ew) 


k(w) = 


In this work, there is room for a considerable amount of- freedom in 
selecting the rational operator which is taken to approximate e*°, The 
problem of determining such an operator is the well-known one of 
designing pure delay networks and has been discussed at length in the 
Bell System Technical Journal and elsewhere. Given 4(w), each such 
design leads to the determination of a rational filter characteristic with 
no singularities below the real axis. The realization of such real functions 
as the voltage-transfer-ratio characteristics of networks is also, at least 
in large measure, established engineering practice and may be found in 
such books as Guillemin’s Communication Networks. 

An alternative approach to filter design may be found in the use of 
Laguerre functions or similar combinations of algebraic and exponential 
functions to approximate to 


{ BOD erty eu < «). (3.76) 
-° ¥(u) 


Such a method will probably have no significant advantages over 
methods depending on an approximation to e”. 

The detailed design of a filter involves certain choices of constants 
which must be justified economically. In general, it does not pay to 
eliminate a small error from a quantity when there is a large irremovable 
error in it. The irremovable error in the mean square of the performance 
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of a filter, as we have seen, is 


L f°  B11(@)S22(w) 


Qn Jo Byy(w) + So9(0) = one 


This minimum error is attainable only in a filter having an infinite time 
of delay. There is manifestly no use in increasing the delay time of a 
filter if the part of the error 

2 

| dw 


~f{ {eu(o) — (a) 
(8.78) 


due to the finiteness of this delay is already small compared with the 
intrinsic error. 


1 = tot * H(u) twe 
ee ee dt Ls ER du 








3.8 The Determination of Lag and Number of Meshes in a Filter 


If we thus fix on the reasonable delay for a filter, there is still another 
error dependent on the fact that the theoretically optimum design is 
not, in fact, realizable with a finite network of resistances, capacities, 
and inductances. This is the error implicit in our approximation to 
e, Again, such an error may be estimated, and there is no point in 
reducing this part of the error to a level substantially below that of the 
error implicit in the delay already chosen. It is this last error which 
determines the number of meshes and parts appropriate to the complete 
filter network. Once this final error is decided upon, the rational voltage 
ratio characteristic of the network is determined, and its design is a 
matter of known technique. 

The filter theory here presented, like our prediction theory, though 
it is adapted to the use of electrical circuits, may be carried out asa 
purely numerical computation, as may be indicated in certain studies of 
meteorological or geophysical time series, or may be applied to mechan- 
ical structures, as in the smoothing of the performance of hand-cranks 
and servo-mechanisms, In the latter cases, the large time scale of the 
process may put the use of purely passive electrical networks out of 
question, as really large inductances are not practical. On the other 
hand, such filters may be realized mechanically; or they may be realized 
electrically by the use of active networks, in which vacuum tubes or 
other amplifying devices appear, but in- which inductances do not 
appear; or they may be realized by a combination of mechanical and 
electrical devices. In all cases, for a shot-effect noise, the information 
required for filter design consists in the spectrum of the message to be 
transmitted, the noise level, and the permissible delay. 
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3.9 Detecting Filters for High Noise Level 


An especially interesting and important type of filter is that employed 
in the observation of very faint messages when the noise level is very 
high. Here certain simplifying assumptions may be made. We shall have 

Poe(w) = 1; Py(w) = Pw); Bw) =1+eF(w) (3.900) 


if the noise is suitably normalized. To factor (w), we introduce the 
function 


G() = = Bi eat cf _ Fue du. (3.901) 
Then approximately 
$(o) =|1+ G&) fF; (3.9015) 
and if we put 
1+ cG(w) = ¥(w), (3.902) 


we shall have approximately 
H(w)  &F (w)e** 
Teo) a 1+ G(w) 

To a first degree of approximation, 


= eF (wel — G(o)]. (3.9025) 


kw) = <= f * grivt af £ : F(u)et*4® du 


= give = 7. ett r F(we™ du. (3.903) 


This means that, if we allow a long delay and neglect the effect of this 
delay, the frequency character of the desired filter will be eF' (w) itself. 
The next degree of approximation is 


mden ng iD, le) 
ku) Se Bo) 1 +eF(s) 


It will be seen that this function is qualitatively very similar to F (wu), 
and that, except for a numerical factor, a filter with frequency char- 
acteristic F(w) will not perform too badly for a considerable range of 
large noise intensities. Of course, the criterion which we have used for 
the performance of a filter is perhaps not so obviously natural at high 
noise levels as at low. Nevertheless, it is still not a valueless one. 





° (3.9035) 


3.91 Filters for Pulses 


The problem of filtering at high noise levels arises chiefly in connection 
with the detection of extremely faint messages, at the threshold of 
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detectability. A closely related problem is that of the proper pulse form 
for such messages. For many purposes, a message to be detected by such 
means must be accurately localizable in time and, for the avoidance of 
incoming and outgoing interference, must also be accurately localizable 
in frequency. Let the time form of such a pulse be f(t), and its Fourier 
transform (the frequency form of the pulse) be g(w). Here 


ene) fis0 miut gy (3.910) 
gu Soe sa | Fe : : 
A reasonable measure: of the time spread of such a pulse will be 
ff. #ls Pat 
fo ls Pat 


Thus a reasonable measure of the combined time and frequency spread 
wil be 


[lel [? dt fw? | ou) Pau 
SaaS ee eee 
filo Pat fo oe Paw 

fl \to Pat 


Assuming the total energy of the pulse to be fixed, the problem mini- 
mizing the combined time and frequency spreads reduces to that of 


minimizing ’ 
g 6 i (err 4x [e] ) ar, (3.9115) 


while f [f@? dt is constant. If we now vary f(t) by adding to it 
f(t), we see that we should have 


(3.9105) 





(3.911) 


- afl) a 
dt dt 


] of (t) dt (3.912) 


0= ui “OO d+ (f(t) at 


is d*f(t) 
i nf ' [erm -wie 
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whenever 
ff s0¥0 a =0. (3.9125) 
This leads to the Euler-Lagrange equation, 
ef) — zh = yf(t), (3.913) 
and the E? solution of this, with smallest characteristic value, will be 
f@® = exp ($) ; (3.9135) 


‘This is accordingly the desired: pulse shape, and the desired k(w)e*™* 
will be 
F(w) = const exp (—).w*). (3.914) 


For detection of faint pulses of this form, we accordingly wish a filter 
with characteristic approximating ¢~”’. Fortunately, the time char- 
acteristic of such a filter will be nearly 


(const) exp | ‘ies mi =, (3.9145) 


and this may be approximated by taking 


f* exp (=) (const) exp [- fosaer). (3.915) 


On the frequency scale this yields us 


1 
7 ‘ 
(Vz - ie) 
and our problem is completely solved. 


3.92 Filters Having Characteristics Linearly Dependent on Given 
Characteristics 


Let us now explore the problem of the filter consisting in s linear 
combination of fixed operators with variable coefficients. Let us seek to 


ii fie ~ f- 
~ ie ae -T 


" « 2 
— Eas ft — 2) + ot — Makals)| at 


(const) (3.9155) 
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= 911(0) — ar| f fesii(a@ + 7) + vole + 7)] x &, dK;(t) 


+ f a ay, dK, (c) of ~ a, dKz(r)e(r — c). (3.920) 
Let us use the notation 
f ” ete aK u(t) = kyo) (R= 1,-++,n). (3.9208) 
Then, using reductions now familiar to us, we may write 
1 c 
in =. Sxl) de 
n x 1 2 2 ae 
— ar { Dae J tere) + dalorlehate) de 
1 TJ ~ 2 
J n n 4 ~o 
+ 5- Lay Lae f  P)kj(o)e(@) do. (3.921) 
wy. 1 co 
If our ky (w) are so normalized that 


Oj k; 


eres © (3.9215) 


Ff” Herbs ovixte) de = | 


this gives us 


= =f" $11 (w) dw — 
oR] Sat [” wie) + e2)lehle) dol + S| ax? 
2r J —« 1 


2 


iz = i, iw) die — z | = 1 (11 (w) + Pip (w) Jey (w) do 


2 





+Ela-> f” Pre) +eullehe)do| . 6.922) 
1 Tye 








This will be a minimum when and only when 
1 bf oo 
a= =f Pu&)+4ullleF@) de, (6.9225) 
and the minimum will be 


13 ) n 
5c Jf. sl) do ~ E 


2 





= Ao ($11) + S12 @)Je**kx(@) du 





(3.923) 
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Under these conditions, 


E ark (w) = (3.9235) 
oa EG Pls 1 ££? P11) + P22) iaue (nyo) | 
Foy LE OHO) a fee AONE) de 


B11 (w) 4- So9(w) gine 
¥(w) 

tions k,(w)-¥(w), multiplied by 1/¥(w). If the set of functions 

ky,(w) - Vw), all of which are free from singularities in the half-plane 

below the real axis, is complete in the set of such functions [and if it 

is not, it may be made so by adjoining other functions ky(w) + ¥(), 

which may be taken so as to satisfy (3.9215}], we obtain as the formal 


which is the formal development of in the func- 


n 
limit for 2 a,k,(w) the expression 
1 


ee pages © Ois(u) + Fie(u) oy 
2Qr¥ (w) f ee aie U(u) edu, (3.924) 


which we have already seen to solve the filter problem as stated, without 
any reference to a specific set of functions k,(w). 

Let it be noted that we have here assumed the factorability of S(w) 
into | V(w) c where Y(w) is free from singularities in the lower half- 
plane. If this is not the case, as we have seen, 


- log] @) |, _ 
pa da = =, (3.9245) 


Under these circumstances the set of functions VE(w)k (w), where k(w) 
is free from singularities in the lower half-plane, is closed L?. Otherwise 
there will exist a function (w)l(w) of class L?, not equivalent to zero, 
such that 


oe * @(a)k(a)l(w) deo = 0 (3.925) 
for every k(w) for which 
4 ” B(w) | kw) [Fd < (3.9255) 


and which has no singularity in the lower half-plane. This will certainly 
be the case for every function k(w) of class L? having no singularity in 
the lower half-plane. It will follow at once that 


P(w) - L(w) (3.926) 
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will be orthogonal to every function of class L? vanishing for negative 
arguments, and will itself vanish for positive arguments. Thus 


* log | S(w) [ + log | U(w) | 
fee dw > 


This would mean, by (3.9245), that 


is log| V@@N)| DL 
ae 1+ ou eae 


— 0, (3.9265) 


(3.927) 


which is manifestly false, as the function V(w)l(w) belongs to L?. 

Thus no such U(w) exists, and the set of functions V®(w)k(w) is closed. 

In other words, we may choose the functions V®(w)k,a(w) in such a 

way that 

$13 (w) + Py2(w) giao 
? 


w 5 wo) = f 
vVa( x axky(w) iG (3.9275) 
and that formally 
- _ Su) + F12(%) soy 
x axk,(w) = ee ene, (3.928) 


This means that in such a case the performance of a filter for a finite 
delay may be made to approximate as nearly as we wish the performance 
of a filter for an infinite delay. This is quite reasonable, since for such 
messages and noises the entire future of the message-plus-noise is 
determined by its past, and nothing new ever happens. 

Let it be noted that the situation depends on the factorability or 
‘non-factorability of @(w), which involves both the message and the 
noise, and not on the factorability or non-factorability of a term con- 
taining the message alone. Even with a perfectly predictable message, 
the presence of an imperfectly predictable noise makes the filtering 
problem a significant one. 

The problem which we have just solved is that of the design of a filter 
having a character which is the sum of fixed operators with adjustable 
coefficients. The functions ky(w) may be any functions obtained by 
normalizing in the proper sense the functions (1 ++ azw)* (n = 0,1,2,--+), 
for then they themselves [and a fortiort the functions k,(w)] will be 
closed in the set of all functions of L? which are free from singularities 
in the half-plane below the real axis. Since we have an algorithm for 
obtaining the coefficients, the filter-design problem is solved. Such 
adjustable filters are of the greatest value in experimental installations, 
in which the adjustability may actually be realized by the turning of 
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rheostats or potentiometers, or even in permanent installations in which 
the variety of work to be undertaken is very great. Of course, they will 
ordinarily be more complicated than the fixed-constant sets having the 
minimum number of elements for the same performance. 


3.93 Computation of Filter: Résumé 


Let us then sum up the mathematical stages in the design of a filter of 
fixed or variable characteristic. The first stage is the computation of the 
even function 41;(#). On the basis of this, with a proper choice of the 
scale constant P, the coefficients 


- Peer. 
a | Pai (o) ee te? a de (3.930) 


are then computed. Then the Cesaro sum 


Le ( ly ) -2vitanwP [ 2ré tan“uP 

1 — —Je i $11 (ue du (3.9305) 

2r va—N N =e, 
is computed as an approximate value for $,;(w). This is then written in 
the quotient farm 
olynomial in w*P? 
Tp (3.931). 

and the numerator and denominator are factored into linear factors. 
In factoring the numerator, algebraic equations of high order may have 
to be solved for their complex roots, and the use of a device such as the 
Isograph* of the Bell Telephone Laboratories is indicated. Then, by 
selecting in both numerator and denominator only those roots with 
positive imaginary part, the function Y(w) is determined. In the case of 
a lead filter, all is plain sailing from here on. In the case of a lag filter, 
through considerations such as those we have already indicated; the 
proper lag is determined, as well as the degree to which it is worth while 
to imitate e*° in determining the proper approximation to this lag. Then 
we call on the existing technique of delay-mechanism design to realize 
this approximation in terms of the simplest rational characteristic pos- 
sible. When this approximation is known, we have already given the 
formulae which determine the final filter characteristic. Finally, whether 
for lead or for lag, we have to realize the (now determined) characteristic 
by a network, which is then subject to the many known tricks of network 
transformation. 

* See Industrial Mathematics, by T. C. Fry, Bell System Technical Journal, Vol. 


20, No. 3, July, 1941, p. 276. There are also well-known computational-algebrate 
methods of achieving the same result. 
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So much for the fixed filter. For the variable filter, we are faced instead 
with the choice of a suitable closed set of functions 1/(iw + a)* and of 
their orthogonalization with respect to the function @(w), This alone 
determines the structure of the filter, while the determination of the 
numbers a, in terms of @ gives the setting of the apparatus for a desired 
lead or lag. 

Much less important, though of real interest, is the problem of the 
numerical filter for statistical work, as contrasted with the filter as a 
physically active piece of engineering apparatus. In the case of con- 
tinuous data, there is little new to say of this, except that the k(w) 
already obtained must be translated into a K(t) from which we may 


evaluate ni ‘ f(t — 7) dK(r). In that particular subcase of the discrete 


case in which f and g are independent, we follow the lines of the prediction 
theory of the previous chapter and define the function $(w) in terms of 
the auto-correlation coefficient 


7 1 
Filry = J ON G1 ON + ee = yirted (3.9315) 


where the f, constitute the time series with which we are working. 
We further put 


$n(o) = LD vive”. (3.932) 
This will be a periodic function of period 2x. Similarly, 
I iva 
Vite = ORY 7 = 9rtuGu; P22(w) = =z Yowe ”*, (3.9325) 


As an approximation, we may use Cesaro methods as before and may put 





cd ]»| , 1 AIS ve 
be 1 Zt) (vir + Prove 7? = ps yer"? (3.933) 
kar N = 
where the factoring is so carried out that 
N 
Vo) = Dyer 40 (I{w} <0). (3.9335) 
0 


Here again the isograph may be used. We then put 


1 = —t eas 11(u)e™™ iur . 
k(w) = ive &° Bi os e“’ du (aan integer) 





(3.934) 
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If now we write 
i r 
k,=— f e'’*k(w) du, 
2r -* 


we shall minimize 





: L ¥ = _- 
roe 2N + 1 zx fora = x (fru + Jr—a ) Ky . 


The minimum of this expression will be 


—S {Pi1(@) — [$11() + &()] | k(w) |?} dw. 


103 


(3.935) 


(3.936) 


(3.937) 


CHaptrer IV 


THE LINEAR PREDICTOR AND FILTER 
FOR MULTIPLE TIME SERIES 


4.0 Symbolism and Definitions for Multiple Time Series 


The difficulties of the present chapter are rendered much more con- 
siderable by the sheer bulk of notation required to deal effectively with 
toultiple time series. Let us then agree on the following points: 


The symbols f,(t)(1 < k < n) shall signify messages. 

The symbols g;,(t)(1 < k < n) shall signify disturbances. 

The symbols ¢;""(¢) shall signify correlation coefficients of messages, 
or symbolically 


oie : 1 Z ——_ 
enn) = Tien fi filet sdf) de (4.00) 


To 


The symbols ¢;,74(t) shall signify correlation coefficients of disturbances, 
or symbolically 


20) = tim Je fost + Na) ad (4.003) 
Pik oT om vf T)Gx (7) at. Fi 


T2 


The symbols 9,,"7(t) shall signify correlation coefficients between mes- 
sages and disturbances, or symbolically 


enn) = tim Sf fyb + ae) dr (4,005) 
Z T+ 2 2T -T ss ; 
Let us put 

ojelt) = eye (t) + 007 (0) + oei™4(-8 + 0270; (4.01) 
xe(t) = giv (t) + ern? (t). (4.015) 

We shall write 

1 = iwt . 
=f. 3. (we dw = oj (t); (4.02) 
1 - . 

x f " Xj(w)el* do = x4). (4.025) 
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Here 
@54(w) = 2; (a). (4.03) 
We shall put 
S(w) = | p(w) |. (4.035) 
Where it can be done, we shall put on the real axis 
$;;(0) = |¥,)|?, S@) =| ¥@)/?, (4.04) 


where ¥;(w) and Y(w) are free from zeros and singularities in the lower 
half-plane and are bounded at infinity in that half-plane. 


4.1 Minimization Problem for Multiple Time Series 


The problem which we now wish to solve is that of the best approxima- 
tion in the least square sense to f,;(¢ + a), by the sum of a set of linear 


operators on the pasts of fi (t) + g:(t), --+, fn(t) + ga(t). In symbols, 
we wish to minimize 


M= Tim af, fitt+a)— = 7 tale— D1 aKa at 


= ext) —28{ E [tenet 2) +eu™e+ 1 aR} 





+4 E [O akso) [” aEO lon - 0) 


jelke=} 
+ vie"4(r — 0) + oni™4(0 — 7) + 9924(7 — 0)} 
=e) —2R | Ef” ale +) aK 
+E Ef akyo) [” aKiGen(e—o). (4.10) 


j=lkwl 
If now we let 


mat = Ef onlr— 0) dQ;(o) (r>0), (4.108) 


we get 
M = esto) - 21 EE aa, [~ enter - 2) dO} 
+ 5 E f ” aK;(c) f ” gules eagle 


= on™ (0) — ae xf 40,0) [ enle— 0) dG 
FEE fo ate) — KO) [- eal - 0) AG) — BO 


j=l k=1 
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= on™) - ES 40;(0) [~ enlr — 0) data) 
aes 


te edo 2 


Son) - EE [aaj [” vale — 0) de). 4.41) 


F=1k=1 


(t— 1) +9)(t—1d1Q\(2) — Ky(0) [ at 





To minimize this expression, we may obviously put 
Kj(r) = Qi(7) (J = 1,-++, 2) (4.115) 


and we have reduced our problem to that of the solution of the system of 
equations (4.105). 
Let us now put 


ne) = 7 et gQ;,(t); jw) = f é&*'dK;(t). (4.12) 
We shall have 


pS “ae — Kj) [exes — @) dfQe@) — Kel 
g=mtkeo1 YO 


= EES f_ eaWllase) — kellie) — H@)da. 4.125) 


Pa 2, Qn 
If then the Hermitian form 
>a = Biz (w) aj, (4.13) 
joi b= 


is positively defined for every value of w, M cannot be minimized unless - 
K;(r) = Q;(r), and this gives us a unique solution of our minimization 
problem. We suppose, of course, that the Hermitian expression 


=e B52(0)q5(w)ge(@) do (4.135) 


j x = =] 
is finite. 
42 Method of Undetermined Coefficients 


The functions g;() will be free from singularities in the lower half- 
plane and will there fulfill some condition akin to boundedness. On the 
other hand, the functions : 


Xp(w)eiae — z &ja(u)q;(v) = Hele) (4.20) 
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will be free from singularities in the upper half-plane and will fulfill some 
boundedness condition there. We shall have 


Xi (we — Hi(w) Sei(w) --- Sar (w) 


B(w)gi(w) = X2(w)e*** — Ho(w) $oo(w) +++ FSag(w) - (4.205) 


i 


Xn(w)e* — Haw) Son(w) +++ San(w) 
Let the cofactor of 6,;(w) in || 4,;(w) |] be F.;(w). Then we may put 


2(a)n(@) = EX) — Hy(o)Fis(o); (4.21) 


as a consequence, 


$(w)qi(w) = E Xj(w)e oF (co) + L-—_—, + H@), 215) 


= on 
where H (w) is of algebraic growth and free from singularities in the upper 
half-plane, and w, is a singularity of some F'1;(w) in the lower half-plane, 
which never has a multiplicity greater than » in any F',;(w). Moreover, 
it must be possible for (4.135) to be finite. To find such a g:(w), a3 in the 
last chapter, we reduce our last equation to 


X;(w)e**F 1 ;(w) +E Bus 


5 7 FH) (4.22) 
V(w) (wo — w,)* 





Hon) = E 


where H;(w) is of algebraic growth and free from singularities in the 
upper half-plane. Here we have used the fact that 


1 


————_____ (4.225) 
(co — w,)*W (a) 
has only singularities of the form 
Buy 
a (i SH) (4.23) 
(@ — w,)" 
above the real axis, while 
dla (4.235) 
VY(w) 


has no such singularities. Thus, by the technique of our last chapter, 


LX Bus 
ao) = vo) + —_—_—— 4,24) 
gi() 71 (w) (Fe) — «)'] ( 
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where 

E Xj (uel (ue 
no) ah oe S| | ™ 


(4.245) 
In other words, r1(w) represents that part of 
X Xj(w)e**F;(w) 
+ (4.25) 
¥(o) 


having singularities only above the real axis, multiplied by 1/¥(w). 
In determining this part in practice, it will be convenient for a < 0, or 
the case of a lagging filter, to approximate e* by some rational approxi- 
mate lag characteristic, such as 


G + ey. 


1 — (faw/2n) ese) 


It will be understood that the same approximation is used consistently 
throughout the problem. 

We have thus determined g;(w), except that the parameters B,, are 
still indeterminate. We may proceed to determine g2{w) in a similar way, 
and so on. We thus reduce the final solution of our problem to the solu- 
tion of a finite set of linear algebraic equations in a finite number of 
parameters. Since we know that our entire problem has a unique solu- 
tion, the result is that solution. 

Let us now confine ourselves to the case n = 2. Here (4.20) becomes 


aes — €1(@)g1(@) — $21(w)g2(w) = Ai); 


Xo(w)e™™ — dyo(w)gu(w) — b22(w)Q0() = Halo). °°) 


Then 
$12(w) Buy 


X2(w)e** — dyo(w)ri(w) — £ Uw) (wo — wy) 


— $22(w)go(w) = Hew). 


This gives us 


a 1 : —twt 
g2{o) = Ona) i et! dt (4.265) 


By, B12 (x) 


ns Xo(ue™ — P2(u)ri(u) — 2X (u—w,)* ¥{u) 








4 ef! du, 


oa 2 (u) 
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Thus the first of equations (4.26) becomes 


Bor (w) = 


Iwt oy 
BrVo(w) Jo ° 


Buy 
Xy(w) — b11(w) [rv) +k er * 





y By _ Pia(u) 

_ | Kae — S12(u)ri(u) - E—— 

¥ (u = a)" ¥(u) | sit 
—« Va (w) 


= H,(u), (4.27) 


where H,(w) is free from singularities above the real axis. If we take this 
equation exactly as it stands for a > 0, or use the same approximation 
(4.255) to e'** as we used in determining 7, (w) if a < 0, we obtain from 
the partial fraction development of the left side a set of linear equations 
in the quantities B,,, which we may solve for the value of B,,. Similar 
methods may be used for n greater than 2. 


4.3 Multiple Prediction 


In the special case of the prediction problem as contrasted with the 
filter problem, 


X,(e) = ®,;(w) G ='l, 2, ney *). (4.31) 
Here 
~ ae AEP = iw je tula+t) 
10) = 575 f car f" veut du, (4.32) 
so that 
os 6 Nh Pe He SS tu (att) 
g(a) = mn e hdt io (ue du 
+e ——_3 ___, (4.33) 


1 (w — w,)*¥ (we) 


where the w, are the poles of 2)2(w) and $29_{w) in the upper half-plane. 
Thus 


wy = bf” got gy [7 Ble = 11) eas 
a2 ) 2rVo (w) 7 ew at f. ¥,(u) e™ du, (4.34) 
and 


©; (co) [e** — 91 (w)] — Bar (w)g2(w) = Hy (w) (4.35) 


is free from singularities above the real axis. 
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4.4 Special Cases of Prediction 


Let us try this method in a particular example of prediction. Let 


1 
Bi (w) = Boo(w) = pe ee: ; 


€ é 
$21(w) = G4 wy? Pow) = fi—e (e<1). (4.41) 

Here 

1 vi-¢ 
Vilw) = ¥2(e) = Liou ¥@) = a+i) (4.42) 
and 

ri(a) = & [1 + a(1 + ww)}. (4.43) 
The only possible w, is 7, with a multiplicity never greater than 2. Thus 
gi(w) = A+ Bil + iw). (4.44) 


The only values of g:(w) for which our Hermitian form will be bounded 
will be 


gi (w) = A. (4.45) 
It may readily be seen that this will yield 


qo (co) == e(1 + tw) _ pte) f pg f — = et! du = 0. (4.46) 


Thus the prediction of f;,(¢) in this instance depends on its own past 
alone, and not at all on that of fo(t). The prediction problem reduces to 
that of an earlier section (2.1), and we have A = 

Let us interchange the roles of f;(¢) and fo({t) in this same example. 
Let 


(0) =4n0(e) = 3 i dale) = Gi 
®y2(w) = aT (e <1). (4.465) 
Again, eae 
HW) = he) oi Ww)= Ts; aan 
and 
nw) = Il + a(l + te). (4.475) 


The only possible w, is ¢, with a multiplicity never greater than 1. Thus 
gw) = e* + BL + w). (4.48) 
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The only value of g(w) for which our Hermitian form will be bounded 
will be 

gi(w) =e. (4.485) 
This in turn yields 


- e(1 ~ tw) “ —iwe woe! = tu tau _ ga) tud 
a=". Jf . af a+ int © di 


_ fl t+ w) —twt i: ava -itsl feu _ wr) tut 
Qe 1 ; de aa ae aa 


Que 


= (1 + tw) —— iti 
= Qace™. (4.49) 
The formulae he now become 





1 zs 7. 02 (et —e*)— qo = Hi(w); (4.491) 
TE ah OP) — pate = Haw); 4402) 


and it may readily be seen that the H,(w) and H2(w) thus defined are 
free from singularities above the axis of reals. 
4.5 A Discrete Case of Prediction 


It may be as well to choose our third example as a case in the predic- 
tion of discrete time series. Accordingly, let 
@=1; F1() = Sy2(w) = 1; 
Po (w) = «eit; P12(w) =e” {e < 1). (4.50) 
Here 


Wo) = VI-— 2; Glo) = d29(0) = 1. (4.505) 


In a discrete case such as this, we get 


—twn i D su(ntl) 
niga i) ames si “— " 


ae ze eviwn f git(nth) go, 
- n=0 —* 

= 0. (4.51) 

Furthermore, ¥19(w) and Y23(w) are bounded above the real axis. Thus 


gi(w) = 0. (4.515) 
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This leads to 
l 2 
ao) = 5- Ee es e* (ee du =~ (4.52) 


On the other hand, let 


a@=1; %;(w) = S22(w) = 1; 
$2;(w) = € 3 Dig(w) = ee (€ <1). (4.525) 
Here again 
Vi) = V1 — é; W1(w) = Yew) = 1. (4.53) 
As before, 
T1(w) = 0. (4.535) 


On the other hand, 412(w) behaves like a constant multiple of e*” ahove 
the real axis. Thus we put 


qi(#) = r1(w) + A + Be” = A + Be, (4.54) 


g2(w) = = z P is f. e*(e™ — A — Be) e™™ du 

= —Be. (4.545) 
Thus 
11 (w)[e** — 91 (w)] — P:2(w)g2(w) = e* — A — Be** + Bee; (4.55) 
and if this is to be a possible H(w), 

B= 0. 

Since f(t) now must be predicted on its own merits, 
a = orton J" wi(uee™ du = 0, (4.555) 
so that, in this case, no prediction whatever is possible. 
4.6 General Technique of Discrete Prediction 


These trivial cases are yet enough to illustrate the technique of 
multiple prediction in the discrete case, which will be the typical case 
of the business statistician, the meteorologist, and the geophysicist. As 
always, the first step to be taken is the determination of the ¢,;;’s and 
all related functions. Here there is a technical point worth consideration. 
We define 9,;""(v) as 


gw) = 


. toc —— 
cases aT xs (n + v)f(n). (4.60) 
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However, an infinite process like this cannot be carried out in complete- 
ness; besides, there is a great convenience in regarding 4;/""(w) as a 
terminating series in positive and negative powers of e**. Is it then 
advisable to take 
4, 

ET Eset nw) (4.608) 
as an approximate representation of ¢,;""(v)? The answer is no. The 
most essential property of a single ¢;/""(v) is that it may be written in 
the form 


f e'”* dA (co) (4.61) 
where A(w) is monotonically increasing. Similarly, the quadratic form 
LL ei" (u — v)a,a, (4.615) 
LP ae od 


must be non-negative for every w. This is not true if we replace 
gif" (4 — v) by 1 
Bea Ee liln + ow — ie. (4.62) 


On the other hand, this is true if we replace »,;""(s — v) by the 
approximating function 


ee awe _—__ 
2N +1 yen fila + w — v)fi(n). (4.625) 
-Ngntu-»gN 


To take the picts case, 


p> 0,2, ———~ f(nt+p—v)fin) 


2N . 1 yen 
—Noata—rsn 





1 2 
= > 
2N +1 2 guteiyss afin te)) 20 


In the discrete case, the fundamental equations we have to solve are 
the system 


xilacteo) -E 35 gate — 2)0j(x) (920) (4.688) 
corresponding to (4.105). If we put 


E_ wale = Xsw); EB eae = Pale); 


E Oye = g(e); (4.64) 
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this leads us to 


n Co] 
Kz (w)e*or = DD © 3,(w)gj(w) = yy b,xe**”. (4.645) 
j=l 1 
We proceed from this to the analogue of (4.26), namely 


[el — 9 (eo) F512 (ce) ~ Foy (w)gae) = 3 d,e"; 
i (4.65) 


[e** — 91 (a)]B12(@) — F22(w)go(w) = a eee" 


Forming the determinant of these two equations, we get 


fe” — 1 (w)}B(w) = — E byes () + EDee*a1(w). (4.655) 


Now let us put 
&(w) = ¥(w)- VB), (4.66) 
where 


U(w) = do + DG (4.665) 
1 


has no poles below the real axis. Then 


[e* — WM) = Leena) + Efe). (67) 


Hence 
gi (w)¥(w) = (4.675) 
j (yekees F x : = s = , “ 
= qin ff fy (u)e'** ay = ee" "Boo (u) a ¥ fe™'So1 wo} et” du, 
2 0 —* 1 1 


That is, ¢:(w)V(w) is the summation of 


~ E vive f * w(ueOt) dy (4.68) 
0 -s 
and terms of the form 


Ay" O<Sk< p) (4.685) 


where p is smaller by one than the degree of the highest term (containing 
2 positive power of ¢~**) in either 2o2(w) or Boi (w). 
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We then have 





: pee tke 
sit (or The al [lmao au EI 


— $22(w)g2(w) = al (4.69) 


which we solve in a similar manner, obtaining 


ee ee a 
a2(0) = aw) ee us Vo(u) { 


@ Azer 4 
> ine o(atp) ——} pu” 
-st ¢ [vo dy — = Sole du. (4.691) 
Finally, 


B11 (w) {6 * — g3(w)} — S21 (e)g2(w) = Eheim, (4.692) 


which gives us a set of simultaneous linear equations in the Ax, adequate 
to determine their values. 

Once we have the functions g;(w) and g2(w), we develop them into 
Fourier series, the coefficients of which will be respectively Q,(v) and 
Q2(v). Then the best prediction of f, (¢) will be 


= Ault + »)Qi) + a fel — Hea: (4.693) 


In essence, our methods for the treatment of time series are very 
closely related to the conventional methods which develop the ellipsoid 
determined by the correlation coefficients of the several quantities 
J;(t — v). The existing methods, however, do not take adequate con- 
sideration of the time structure of the data correlated. The fact that 
they show a statistical invariance under the translation group is a certain 
indication that the Fourier methods are desirable. Consequently no 
use is generally made of the simplifications which result from the con- 
sideration of the entire past of a function as the basis of prediction, 
rather than of two or three fixed epochs in the pasts, and such a technique 
grows unmanageably complicated as a larger and larger number of 
epochs are taken as the basis for prediction. To determine a, reasonable 
distribution of the past epochs capable of serving as the basis of predic- 
tion is also practically impossible by methods of previous techniques. 
On the other hand, our methods really do make use of the structure of 
the translation group, which dominates all time series and gives an 
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intelligible asymptotic theory when prediction on the basis of the entire 
past is considered. 

The methods which we have here given for the case of a double time 
series are extensible, without difficulty of principle, to time series of any 
multiplicity. We have given clear indications, moreover, as to how the 
filter problem is to be handled as well as the predictor problem. 


CHAPTER V 


MISCELLANEOUS PROBLEMS ENCOMPASSED BY 
THE TECHNIQUE OF THIS BOOK 


5.0 The Problem of Approximate Differentiation 


It will have become obvious to the thoughtful reader that the methods 
of this book may be extended considerably beyond the prediction and 
filtering problems to which they have been already applied. ‘The indica- 
tion for their use is: the existence of a linear problem, invariant under 
the translation group, which is not capable of an exact solution, but in 
which a measure of the failure of an approximate solution may be given 
as the mean of the square of the modulus of an expression known to be 
linear in terms of the function upon which we are operating. 

A very important practical problem is that of the determination of 
the derivative of a message function which is corrupted by a noise. This 
is a problem of vital importance to all designers of servo-mechanisms. 
If a set of low-frequency data is disturbed by a high-frequency noise, 
then, if we take the rigorous theory of differentiation and actually seek 
to determine 

S(t + At) — fi) 


A (5.00) 


for a very short Aé, the result will have more to do with the disturbance 
than with the message. If, on the other hand, we take an excessively 
large At, we shall obviously have thrown away valid information con- 
cerning the derivative. The whole question turns on the proper definition 
of the word ‘‘excessive.” 

As before, let f(t) be a message and g(¢) a noise. Let 


ae a a 
ex(s) = lim 55 f 1+ MO ats 
oe ae 
els) = lin 5 fot + a@ at 


_ + = 
Ve tin f+ OW 


e(t) = g1(t) + ¢a(t). (5.005) 
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Let us seek to determine k(t) so as to minimize 
See e 
M= tim sa J, 
At least formally, 
eee Fi Te 
umm [itor 
fie ee te at | dK 
ar| [ sim = fF = ia @} 


+f" ao) {7 K@ele — «) 


2 
dt. 





£0 - f° Ue- 2 +9 - Dake) 





(5.01) 


= —~!(0) = aR | fo avo aK | 


a f * dK) f * Geetes Os, (5.015) 
Let us assume a Q(¢) satisfying 
alt) = [ofr — 0) dQ) (> 0). (5.02) 
Then 
M = 670) — faa) f° adele - 0) + 


f ako - a [ aKG - Tile - 0). (6.025) 
tH) 0 


We shall use the notation of Chapter III and shall assume that @(w) has 
no real zeros. Then, as in Chapter III, M is minimized when and only 
when 


K(c) = Q(c). (5.03) 
We solve (5.02) as in Chapter III. Using the notation of that chapter, 
1 (w)iw — S(w)g(w) = Hw), (5.035) 


where H (w) is free from singularities in the upper half-plane. Then, by a 
reasoning now familiar, 


alt O a > dub, (uje™! 
ae) = sya a e-tet gy "2 ede G04) 
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5.1 An Example of Approximate Differentiation 
Let us turn to particular cases. Let 





1 
dy, (w) cal 1 > $2(w) = é, (5.10) 
Then 
“4 14+ + o* 
}(w) — ~ Sage (5.105) 
and 
alt ae hte oO a mm at 
Uw) - i+teé+e 2V1 + étw eur (6.11) 
1+ V2iw — 
Thus we see that 
tw (w) S to 
V@) 4+ V2iw~w)(W1+ ee —-eV2VI4 dia — Cv") 
" Atw + B QM Ciw + D 
L+ Vie — ut? Vibe — V8V1 + hia — Au? 
(5.115) 
We determine the unknown coefficients by the equations 
BVi+é+D=0; 
AV1 4+ 6 — BV2V1 + 440+ DV2=1; 
—AeV2V1 + 24+ BE + V2C+D=0: 
Aéf+C=0. (5.12) 


Eliminating C and D, 


A(VI + € — 2) — Bev2V1 4+ 8 + V2V1 + 4) = 1; 633) 
—Aev2V1 + 4 + 2v2) + Be - VIF 4) =0. 


This gives us 


(eV2VIFE+2V2) (eV2Vi F844 V2VI42") 
Ah mh, 
see Vitd—2 

(5.14) 
Thus 
A= 


Vi+é-2 
142-28 VIF 42EVite8+2EVI fl +2eV (1+) +22Vi+E 
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7 vi¢é-¢ 
LEO 4 EVE HE OVI + Ft 271 + 4S 
vit¢é-¢ 
"et VIF OE + VIFE FRI +e) 
View =< 
~ @+¢Vitadjlet M4e)! om 


and 
Pe AcV2(e + V1 + é) wis = «V2 f é (5 16) 
é-vit+é (@+vVitéye+ Vide) — 
fei 1 : (V1+é-2)iw—eV2 p 
q (24+ViteylerVited) Vite teV2V14 din— Fu? 
( 5.17) 


It will be seen at once that this tends to zw as ¢ tends to 0. This is pre- 
cisely as it should be. 


5.2 A Misleading Example of Approximate Differentiation 
A case which presents certain difficulties ig that in which 











1 
;(w) = rae, 3 Pew) = e: (5.20) 
Here 
Selank & + &u* 
@() = ice (5.21) 
and 
Vite + ew 
Y(w) ioe (5.22) 
This leads to 
tw (w) 2 tw 
¥@) (1+ t)(V1+ ¢ — ew) 
A B 
tS — 5.23 
l+iw Vi+e - ew a, 
We determine the unknown coefficients by the equations 
AV1+2+B=0; 
{ —Ae+ B=1. 
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Eliminating B, 


-—l1 
= ——__—_ 5.25 
e+ V1l+¢ si 
Thus 
—i 
g(o) = (5.26) 


e+ vV1+2)(Vit¢et+ era 


It will be seen that, as e— 0, this tends to —1 and not to zw, so that 
the operator g(w) has nothing to do with differentiation. The reason is 
that the assumed function f(t) with the given ,(w) is almost never 
differentiable. The expression M/ to be minimized is infinite, and its mini- 
mization cannot lead to a significant problem. 


5.3 Interpolation and Extrapolation 


Another problem which may be attacked by our methods is that of 
interpolation. Here we shall explicitly confine ourselves to the formal 
theory and avoid considerations of convergence. If f(t) is of such a nature 
that (w) vanishes outside (~7, 1), it is a known fact* that 

sin r(m — t) 
{OQ = = f(2n agen Peer 5 (5.30) 
As will be seen, this involves the knowledge of every f(2nr) for 
—o <n < o., It is thus a matter of interest to see what becomes of 
the best approximation to this expression in terms of the present and 
past of f(2n7) only. 





Let us put 
iat 1 oN : i, z se + n)]fQx) = om (5.31) 
Formally, 
M = 1 E | E stent my ee 
Tiag Oi gael ic m — t) 





2 
= > [2x (n — k)1Qx 
k=0 


bed = sin r(m — t) sinaw(n — t) 
- 2.2 m(m—t) a(n—2) 
é an | me z ngs aD ee = S vei; (6.32) 


*See Paley and Wiener, Fourier Transform in the Complezr Domain, American 
Mathematical Society Colloquium Publication, Vol. 19, 1934. 
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By exactly the same reasoning by which we have obtained (3.145) or 
(5.02), the minimization of this expression leads to the integral equation 


ao ate = > = Ends (k > 0). (5.83) 


As a consequence of this, 


sin r(m — of . os 
° ma—% a(m — t) = * z, er—jQ; 


= > Ax", (534) 
i 


which we may write 


= act > ae ee p¥ ei} => Ae", 
kas n=—@ a(m — t) 7=0 1 
(5.35) 
Now let us define 
Glo) = LD gee, (5.355) 
and let us notice that 
= pime Si (m —t)e Lime 
ze (m — t)x ie 
imu a, — ("(-4 SoS 8); 
x cae du F iia (5.36) 
We get 
}(w) fe -z yee} = £ Ane. (5.37) 
j=0 1 
If we factor @(w) as before, we get 
¥(o) fo — & gel = E Bye, (6.38) 
f=0 1 


or finally 
—ijo tee - ine cf tau 
q(#) = z ae = Fe [ u(ue du, (5.39) 


which is our interpolation formula. It will be seen that it contains our 
extrapolation formula, with which it formally agrees. 

Similar formulae may be obtained which are related to the interpola- 
tion formula as the filter formula is related to the prediction formula. 
Modifications may be made in the techniques of this book in an almost 
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unlimited number of ways. For example, what we have done for the 
simple differentiator may be repeated without difficulty in the case of 
higher derivatives. It is scarcely worth while to go into details, for these 
details will be clear to anyone who has followed the discussion up to the 
present point, 
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TABLE OF THE LAGUERRE FUNCTIONS 


0 
1.4142 
1.4001 
1.3862 
1.3724 
1.3587 


1.3452 
1.3312 
1.3186 
1.3055 
1.2925 


1.2796 

1.1578 

1.0477 
-94797 
-85766 


-77614 
-70228 
-63545 
.57497 
-52026 


47075 
42595 
-38542 
84874 
31555 


28552 
-25838 
-23376 
-21169 
-19139 


-17318 
-15670 
-14179 
12829 
-11608 


-10504 
-09504 
-08600 
-07781 
-07041 


1 


—1.4142 
—1.3721 
—1.3307 
—1.2901 
—1.2500 


—1.2106 
—1.1714 
—1.1340 
— 1.0966 
—1.0598 


—1.0237 
— .69472 
— .41907 
— .18959 

-00000 


-15523 
-28091 
-38127 
-45998 
-52026 


-56490 
59633 
.61667 
62773 
-63111 


-62815 
-62005 
-60779 
-59273 
-57418 


-55417 
-53278 
-51043 
-48752 
46434 


44116 
-41819 
-39559 
37351 
35205 


Ea i rm fa eel Wes ea eg t 


2 


1.4142 
1.3444 
1.2764 
1.2102 
1.1457 


1.0829 

1.0211 
-96231 
-90443 
-84813 


79337 
-382420 
-02095 
26543 
-42888 


-52777 
-57586 
-58461 
56347 
-52026 


46134 
-39188 
.31604 
23714 
15777 


.07995 
-00517 
-06545 
-13124 
-19139 


-24672 
-29459 
83745 
-37462 
-40630 


-43276 
-45431 
-47127 
-48400 
-49287 
124 


3 


—1.4142 
—1.3169 
— 1.2232 
1.1327 
— 1.0456 


96161 
88010 
80294 
72808 


— .65610 


58692 
03396 


-31011 
-49800 
55897 


-56503 
-50376 
-40838 
29438 
17342 


05398 
05792 
15828 
24458 
81555 


-37080 
41061 
43574 
44764 
44658 


43632 
41410 
-38529 
-34999 
30956 


26525 
21822 
16947 
-11993 


— .07041 


4 


1.4142 
1.2898 
1.1710 
1.0577 
-94957 


-84660 
-T4791 
-65532 
-56670 
48255 


40274 
18711 
48046 
-57283 
-53610 


-42346 
-27337 
-11285 
-03990 
-17341 


28123 
-36051 
41114 
43496 
43389 


41221 
-37341 
32129 
-25967 
-19140 


-12159 
04821 
-02188 
-08845 
-14995 


5 


—1.4142 
—1.2629 
—1.1199 


— .98490 


85758 


-73766 
-62419 
-51887 
-41946 
32635 


- 23298 
-84900 
-55909 
-54103 
39993 


-20857 
-01251 
-15969 
29296 
-38151 


-42566 
42952 
39964 
34331 
-26824 


-18152 
08972 
-00161 
-08806 
- 16587 


-23103 
-28634 
-32743 
-35427 
.36759 


-36820 
-85704 
-33558 
80522 
-26657 
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TARLE OF THE LAGUERRE FUNCTIONS (Continued) 


9 1 2 3 4 5 
-06371 -83129 -49820 — .02158 —.36932 22407 
05764 31129 50037 -02596 —.37915 17635 
-05216 -29210 49970 07176 =—.38193 12589 
04720 273874 49651 -11541 —.37817 -O7411 
-04270 25623 49111 -15658  ~—.36834 .02208 
.03864 123958 48379 “19506 —.35311 — .02889 
.03496 22377 47481 -23067 —.33305 — .07794 
.03164 -20881 -46443 .26331 —.30884 — 12413 
02862 -19466 45287 -29289  —.28103 — .16692 
02590 18131 44273 -31946 = =—.25041 — .20549 
-02344 -16875 -42702 .84300 = — .21738 — .23959 
.02121 -15693 -41311 .86357 = — .18256 — .26899 
01919 14583 -39874 .38125 —.14654 — .29341 
-01736 13543 38406 -89619 —-.10976 — 31264 
.01571 12568 -36919 -40848 —.07270 — .32676 
-01421 -11656 85424 .41824 —.03568 — .83598 
-01286 -10804 -33931 42563 3)88260 ~— .34042 
-01164 -10009 32448 -43081 1)36668 — .34032 
01053 1)92673 80982 -43393 1)71381 — 33577 

2)95289 1)85759 29539 643515 -10482 — .32718 
2)86221 1)79323 28125 -43462 13677 — .31491 
2)78016 1)73335 26744 +43249 16705 — .29929 
2)70592 1)67768 25399 42892 19554 — .28060 
2)63874 1)62597 24093 -42407 -22218 — 25933 
2)57796 1)57796 -22829 -41805 24683 — .23568 
2)52296 1)53342 -21608 -41102 26951 — .21021 
2)47319 1)49212 -20432 -40307 -29017 — -18333 
2)42816 1)45386 -19301 89437 -30877 — .15501 
2)38741 1)41841 18216 -38501 -32539 — .12586 
2)35055 1)38560 -17177 37507 -34011 — .08625 
2)31719 1)35524 -16182 -386475 85263 — .06573 
2)28701 1)32718 -15233 -35393 -36366 — .03615 
2)25969 1)30125 14329 -84294 37267 — .00614 
2)23498 1)27728 -13469 33168 -38004 -02318 
2)21262 1)25514 -12651 32032 -38571 -05203 
2)19238 1)23470 -11874 30888 -88977 -08021 
2)17408 1)21585 -11137 29746 39238 10737 
2)15751 1)19847 -10439 - 28605 -39368 - 13331 
2)14252 1)18243 -09780 «27472 89363 -15823 
2)12896 1)16765 -09156 26349 39249 -18156 
2)11669 1)15402 -08567 25245 -39016 - 20384 
2)10558 1)14148 -08012 -24159 -38689 22452 
3)95536 1)12993 -07488 +23095 38264 24382 
3)86444 1)11929 -06995 22054 37757 -26162 


3)78218 1)10950 -06532 21039 37185 27778 
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TABLE OF THE LAGUERRE FUNCTIONS (Continued) 


0 


3)70775 
3)64039 
3)57945 
3)52431 
3)47441 


3.)42927 
3-)38841 
3)35145 
3)31801 
3)28775 


3)26036 
3)23559 
3)21317 
319288 
3)17453 


3)15792 
3)14289 
3)12929 
3)11699 
3)10586 


4)95783 
4)86668 
4)78420 
4)70958 
4)64205 


4)58095 
4)52567 
4)47564 
4)43038 
4)38942 


4)35236 
4)31883 
4)28849 
4)26104 
4)23620 


4)21372 
4)19338 
4)17498 
4)15833 
4)14326 


4)12963 
4)11729 
4)10613 
5)96031 
5)86892 


1 


1)10050 
2)92214 
2)84599 
2)77599 
2)71162 


2)65249 
2)59816 
2)54827 
2)50245 
2)46039 


2)42179 
2)38636 
2)35386 
2)32404 
2)29670 


2)27162 
2)24863 
2)22756 
2)20824 
2)19054 


2)17432 
2)15947 
2)14586 
2)13340 
2)12199 


2)11154 
2)10198 
3)93226 
3)85216 
3)77886 


3)71178 
3)65042 
3)59429 
3)54206 
3)49602 


3)45309 
3)41384 
3)37796 
3)34515 
3)31518 


3)28778 
3)26273 
3)23985 
3)21896 
3)19986 
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2 3 
1)60951 .20053 
1)56850 .19033 
1)53006 .18166 
1)49400 .17266 
1)46016 .16775 
1)42849 115562 
1)39883 .14757 
1)37107 .13982 
1)34511 .13239 
1)32084 .12526 
1)29817 .11844 
1)27701 .11192 
1)25725 .10569 
1)23882 1)99720 
1)22165 1)94071 
1)20564 1)88673 
1)19073 1)83535 
1)17685 —-1) 78654 
1)16393 1)74017 
1)15191 1)69619 
1)14072 1)65447 
1)13033 1)61496 
1)12067 ——-1) 57754 
1)11170 1)54217 
1)10337  ——-1)50872 
2)95636 1)47712 
2)88459 1)44729 
2)81801 1)41915 
2)75626 1)39262 
2)69901 1)36761 
2)64596 1)34407 
2)59679 1)32191 
2)55125 1)30106 
2)50908 1)28146 
2)47008 1)26304 
2)43391 1)24574 
2)40046 1)22950 
2)36952 1)21426 
2)34090 1)19997 
2)31445 1)18657 
2)28999 1)17401 
2)26741 1)16226 
2)24652 1)15124 
2)22723 1)14093 
2)20976 1)13119 


4 


86535 
-35833 
-35080 
84283 
33445 


-82579 
-81687 
80774 
29847 
-28910 


1.)90109 
185356 
1)80831 
1)76433 
1)72270 


1)68302 
1)64519 
1)60918 
1)57489 
1)54251 
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TABLE OF THE LAGUERRE FUNCTIONS (Continued) 


0 
5)78623 
5)71141 
5)64371 
5)58245 
5)52703 


5)47687 
5)43149 
5)39043 
§)35328 
5)31966 


5)28924 
5)26171 
5)23681 
5)21427 
5)19388 


5)17543 
5)15874 
5)14363 
512996 
5)11759 


5)10641 
6)96279 
6)87117 
6)78827 
6)71325 


6)64538 
6)58396 
6)52839 
6)47811 
6)43261 


6)39144 
6)35419 
6)32048 
6)28999 
6)26239 


6)23742 
6)21483 
6)19438 
6)17588 
6)15915 


6)14400 
6)13030 
6)11790 
6)10668 
7)98528 


1 


3)18241 
3)16647 
3)15192 
3)13862 
3)12649 


3) 11540 
2)10528 
4)96046 
4)87613 
4)79914 


4) 72883 
4)66475 
4)60623 
4)55283 
4:)50409 


4)45963 
4)41907 
438206 
4)34830 
4)31750 


4° 28942 
4° 26379 
4' 24045 
421914 
4‘ 19971 


4)18245 
4)16584 
4)15112 
4)13768 
4)12546 


4)11431 
4)10414 
5)94863 
5)86417 
5)78718 


5)71702 
5)65308 
5)59482 
5)54173 
5)49336 


5)44929 
5)40914 
5)37256 
5)33924 
5)30889 


2 


2)19295 
2)17776 
2)16374 
2)15081 
2)13888 


2)12787 
2)11770 
2)10834 
3)99693 
3)91739 


3)84400 
3)77646 
3)71417 
3)65679 
3)60394 


3)55528 
8)51047 
3)46922 
3)43125 
3)39631 


3)36413 
3)33924 
3)30732 
3)28228 
3)25926 


3)23811 
3)21862 
3)20073 
3)18427 
3)16915 


3)15525 
3)14248 
3)13075 
3)11997 
3)11007 


3)10098 
4)92629 
4)84962 
4) 77924 
4)71458 


4) 65524 
4)60078 
4)55081 
4)50494 
4)46285 


3 


1)12228 
1)11391 
1)10597 
2)98610 
2)91738 


2)85323 
2)79337 
2)73751 
2)68803 
2)63687 


2)59160 
2)54943 
2)51016 
2)47358 
2)43953 


2)40784 
2)37836 
2)35093 
2)32542 
2)30171 


2)27967 
2)25918 
2)24017 
2)22249 
2)20608 


2)19085 
2)17671 
2)16359 
2)15141 
2)14012 


2)12965 
2)11994 
2)11094 
2)10260 
3)94874 


3)87712 
3)81092 
3)74936 
3)69248 
3)63983 


3)59109 
3)54599 
3)50425 
3)46564 
3)42994 


4 
1)51131 
1)48190 
1)45398 
1)42751 
1)40241 


1)37863 
1)35612 
1)33482 
1)31467 
1)29563 


1)27764 
1)26065 
1)24462 
1)22950 
1)21524 


1)20179 
1)18907 
1)17721 
1)16598 
1)15552 


1)14559 
1)13226 
1)12738 
1)11914 
1)11138 


1)10415 
2)97302 
2)90924 
2)84921 
2)79303 


2)74034 
2)69102 
2)64471 
2)60152 
2)56094 


2)52325 
2)48760 
2)45461 
2)42362 
2)30453 


2)36748 
2)34219 
2)31859 
2)29656 
2)27599 


127 


5 
14566 
.13932 
-13316 
-12720 
12143 


11585 
11046 
10526 
10025 

1)95420 


1)90776 
1)86312 
1)82025 
1)77912 
1)74574 


1)70190 
1)66574 
1)63113 
1)59807 
1)56654 


1)53630 
1)50552 
1)48013 
1)45399 
1)42918 


1)40539 
1)38292 
1)36153 
1)34117 
1)32189 


1)30357 
1)28614 
1)26965 
1)25401 
1)23932 


1)22525 
1)21194 
1)19947 
1)18756 
1)18148 


1)16578 
1)15578 
1)15198 
1)13744 
1)12903 
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TABLE OF THE LAGUERRE FUNCTIONS (Continued) 

0 1 2 3 4 5 
7)87343 5)28125 4)42424 3)39691 2)25680 1)12111 
7)79031 5)25606 4)38882 3)36637  2)23889 1)11365 
7)71510 5)23312 4)35632 3)33814 2)22219 1)10661 
7)64705 5)21223 4)32652 3)31204 2)20661 2)99985 
7)58547 5)19273 4)29844 3)28722  2)19164 2)93520 


7)52976 5)17588 4)27410 3)26563 2)17857 2)87874 
7)47934 5)16010 4)25112 3)24502 = 2) 16596 2)82348 
7)43378 5)14573 4)23004 3)22600 2)15421 2)77151 
7)39245 5)13265 4)21072 3)20843 2)14328 2)72264 
7)35511 5)12074 4)19300 3)19220 2)13309 2)67669 


7)32131 5)11310 4)17676 3)17721 2)12361 2)63352 
7)29074 5)10001 4)16188 3)16336 = 2)11477 2)59295 
7)26307 6)93390 4)14823 3)15059 =. 2) 10656 2)55486 
7)23803 6)82836 4)13573 3)13879  3)98916 2)51909 
7)21538 6)75385 4)12427 3)12791 3)91809 2)48553 


7) 17634 6)62425 4)10415 3)10860  3)79046 2)42447 
7)14437 6)51687 5)}87270 4)92171 3)68012 2)37077 
7)11821 6)42790 5)73122 4)78191 3)58483 2)32359 
8)96778 6)35421 5)61239 4)66299  3)50261 2)28219 


8)79235 6)20317 5)51347 4)56197  § 3)42768 2)24783 
8)64872 6)24262 5)42916 4)47613  3)37055 2)21407 
8)53113 6)20076 5)35915 4)40325 3)31791 2)18624 
8)43485 6)16611 5)30044 4)34141 3)27258 2)16521 
8)35603 6)13743 5)25123 4)28893 3)23361 2 14065 


8)29149 6)11368 5)21023 4)24443 3)20008 2)12209 
8)17680 7)70719 5)14725 4)16067 3)13556 3)84623 
8)10723 752003 6)85611 4)10539 4)91562 3)59589 
9)39449 7)17818 6)34734 5)44633  4)41429 3)28638 
9)14512 8)66612 6)14031 5)18969 4)18552 3) 13575 


10)53388 8)25092 7)56435 6)80081 5)82313 4)58138 
10)19641 9)96238 7)22612 6)33600  5)36214 4)29425 
11)72253 9)36849 8)90276 6)14021 5)15811 4)13589 
12}97784 10)53781 8)14246 7)24048 6)29519 5)44859 
12)13234 11)78078 9)22246 8)40535 7)53803 6)54553 
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THE WIENER RMS (ROOT MEAN SQUARE) ERROR 
CRITERION IN FILTER DESIGN AND PREDICTION* 


By Norman Levinson 


In the process of gathering or transmitting information by mechanical 
or electrical means the signal that contains the information frequently 
becomes distorted. Among the diverse sources of distortion there may be 
tracking errors, crosstalk, thermal noise, and poor characteristics of 
pickup, transmitting, or receiving equipment. When the distortion has 
random statistical features it is called noise. 

The modification of a signal is sometimes necessary in order to remove 
the noise and recapture the original message. This process is called 
filtering. The determination of how much of the noise can be separated 
from the message contained in a signal is by no means simple. Cases 
exist where a crude filter will perform as well as an extremely complicated 
one. There are cases where the very best results require an elaborate 
filter but where only slightly inferior results can be obtained by using 
comparatively simple filters. 

In this article 2 method will be presented for determining quantita- 
tively the extent to which message and noise can be separated. Also 
will be given a method of designing a filter to carry out this separation. 
The close of the article will consider the problem of filtering and predict- 
ing simultaneously. The root mean square error approach used here is an 
approximation to and a simplification of the transcendental case devel- 
oped by N. Wiener. 

Wiener’s work appeared in a book of limited circulation in February, 
1942, An independent and similar but by no means identical work by 
Kolmogoroff had already appeared in Bulletin de Vacadémie des sciences 
U.S.S.R., pp. 3-14, 1941. 

A few months after Wiener’s work appeared, the author, in order to 
facilitate computational procedure, worked out an approximate, and one 
might say, mathematically trivial procedure. This procedure is essen- 
tially that which appears in Secs. 2, 3, and 6 of the present paper. It is, 
actually, classical least squares. 


* Reprinted from Journal of Mathematics and Physics, Vol. XXV, No. 4, January, 
1947, pp. 261-278. 
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The basic idea underlying Wiener’s work, rather than the intricate 
mathematical procedure for solving the transcendental problem, influ- 
enced work on smoothing and prediction in fire control work. Various 
classified documents have appeared some of which have lately been 
declassified. Of these the author has seen and been influenced [so far as 
minimizing the expression (35) of this paper is concerned] by the work 
of Phillips and Weiss, Theoretical Calculation on Best Smoothing of 
Position Data for Gunnery Prediction, NDRC Report 532, February, 
1944. 

Another relevant report which the author knows only by title is by 
Blackman, Bode, and Shannon, Afonograph on Data Smoothing and 
Prediction in Fire Control Systems, NDRC Report, February, 1946. 


1 Linear Filters 


Here the discussion will be limited to linear filtering devices. The 
behavior of a linear filter may be expressed in terms of its impedance 
function. This impedance function gives the characteristic of the filter 
in terms of the relationship of the amplitude and phase of the output to 
those of any sinusoidal input. A dual way of indicating the behavior of 
the filter is in terms of the output corresponding to an input which isa 
= ‘ we denote the output 


by A(é), then corresponding to any input f(t) the output 


unit-step function. If, when the input is | 


FQ = f° Ase — 2) dr t+ AOS. (a) 


By A(0) is meant the limit of A(t) as ¢ approaches zero through positive 
values. 

It is useful for many purposes to approximate to the integral in Eq. (1) 
by a sum. We have from Eq. (1) approximately, if h is small, 


F(t) =h X A’ (mh) — nh) + AO. (2) 
n=l 
In case A’(r) is small when 7 is large the tail of the infinite series can be 


discarded. Using the notation An = hA’(nh), n > 0, and Ap = A(0), 
we have approximately, for some suitably chosen JZ, 


Fi) = © Aste — 2). 
n=0 


This last result states that the output is given approximately by a certain 
linear combination of the input and a number of its past values. Or, to 
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put it differently, F(t) is given approximately by a weighted sum of a 
number of past values of the input. In case F(t) and f(é) are adequately 
determined by their values at £ = kh, we find 


af 
F(kh) = X Anfitk ~— nyjh). 
Calling F(kh), Fy, and f[(k — n)h], fe_n, we have 
M 
Fy = ~ Anfe—n- (3) 


2 Minimization of RMS Error 


Let f(t) be a signal containing a message g(¢) and noise. Clearly the 
noise is given by f(é) — g(t). 

Let us consider the output of an electrical circuit with input f(t). If 
the circuit has the response A(t) to a unit-step function, then referring 
to Eq. (1) we see that the output, with input f(é), is given by 


FO = [7 AIG - Dar + AOE. 


Our goal is to have F(é) approximate as closely as possible the message 
g(t). That is, we want to minimize [F(t) — g(t)]. As a criterion for 
measuring the difference between F(t) and g(t) we shall take 


yi 
Loe a J. [F(t) — g()? at. 


This is clearly the square of the rms value of F(t) — g(t). The exact 
procedure for determining A’ (7) from the requirement that the rms value 
of F(t) — g(t) be a minimum leads to an integral equation. 

To avoid the transcendental analysis arising when the signal is treated 
as a continuous function, f(t), we choose a time interval h sufficiently 
small so that f(t) is well characterized by its values at the points ¢ = kh, 
where k& assumes integral values. If we denote f(kh) by b; then we can 
regard a signal as a sequence by. The message contained in the signal we 
shall denote by the sequence a,, and noise, by the sequence of differences, 
b, — ax. It is our purpose to find the best way to treat the signal, that is, 
the by, so as to obtain the information, the a,. 

Let us try to determine the nature of a linear filter which, with input 
bz, will have an output as close.as possible to ay. Using Eq. (3), we see 
that our problem is to determine the numbers A, so that the 


M 
€&e = aE — Dy. Anbi_n (4) 


n=0 
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are as small as possible. What we shall do to try to make the e, as small 
as possible is to require that the A, be so chosen that the average of the 
sum of the square of « should be a minimum. This is equivalent to 
requiring that the rms of the e, be a minimum. 

Stated in formula, we want to choose A, so that 


M 2 
I= lim — _ Anb 

oe wat 1 2 Gam 2 Ae -») - 
should be a minimum. Equation (5) assumes a much simpler form if we 
introduce auto-correlation functions. The auto-correlation function is 
defined as 


<7 M 
Ra(k) = fen Wi i; zx Qj01_ke 


It is an even function, that is, it has the property 
Ra(k) = Ra(—k). 


We shall also be concerned with the ae function 


Ry(k) = m oN 1 ~ - = bibie, 


and the cross-correlation function 


Roa(k) = et 34 ayby_r. 


~2N +1 + Lie 
The cross-correlation function is not necessarily an even function of k, 

It frequently happens that the message and noise are completely 
uncorrelated. In this case Ry_c,a(k) = 0 for all (&). Since 


Rra(k) = Ry_a,a(k) + Ra(k) 


we see that if the message and noise have zero correlation Ry,(k) 
= R,(k). 

When Ry,(k) is itself zero, the signal and message have no correlation. 
This means that the noise cancels the message completely and leaves 
only a random residue, making it impossible to separate any part of the 
message from the signal by a linear device. Thus 2y,(k) = 0 is the worst 
situation that can arise. 

We can write Eq. « as 


1 1 Nv 
= hi — —_—— be_n 
I ne wri, 2 dy tim a FT 
MM 
+ >> p Ds A,Am im ——— a by—n0k—m> 


n=0 m=0 N~o writ 
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Using the auto- and cross-correlation functions, we have 
M M 
I= R,(0) —2 X Ankro(n) + Lo AnAmks(m — n). (6) 
n=0 nym=0 


If the A, are chosen go as to make J a minimum we must have 


ol 
ja, k=0,1,--+-,M. 
Thus 
ol Lal 
— = —2Ry.(k) +2 X AnRs(k — n) = 0. 
OA, n=0 
Or 
M 
xX Ap, (K — n) = Rook), & = 0,1,-++, M. (7) 
n=0 


We have derived Eqs. (7) as a necessary condition that the A, make I 
a minimum, We shall have occasion to prove later, that with the A, 
obtained from Eqs. (7), I actually assumes its minimum value. 

Equations (7) are a linear system of M + 1 equations in the M + 1 
unknowns Aj. From Eqs. (7) we see that the determination of A, is 
dependent on the auto-correlation function of the b’s and on the cross 
correlation of the a’s and b’s. It does not depend on the a’s and b’s 
directly as such. Thus, while the a’s and b’s may differ from one run to 
another, if the correlation functions do not, then a set of A,’s can be 
chosen once and for all which will work for all the runs. In other words, 
it 3s necessary for the sequences az and b; to be elements of a stationary 
random process. 

The advantage of dealing with the discrete sequences a, and b; rather 
than with the continuous functions g(t) and f(t) is that in the discrete 
case we face simply the linear algebraic problem [Eqs. (7)] as contrasted 
to an integral equation in the continuous case. 

Using Eas. (7) in (6), we see that the minimum value of I, J,,, is given 
by 


In = Ra(0) — = A,Rya(n). (8) 


The sum, 2A,Ra(n), on the right-hand side of Eqs. (7), cannot be 
negative since the choice A, = 0 would in that case reduce I. This is 
an impossibility since J, is already a minimum. Thus 2A,Rya(n) = 0. 
The worst case that can arise is for this expression to be zero. This 
happens when all the R,a({n) are zero. For the Rya(n) to be zero means 
that the signal and the message are completely incoherent. In this worst 
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of all cases, J,, = 2,(0). This suggests normalizing Eqs. (8) by dividing 
by R,(0): 








In cab Roan) 
—~—=1-— A . 
Ra(0) z ” Ra(0) 
If we now call J,,/[R.(0)], V, and if we set 
Ria(n) = 
R,(0) Yn 
then we have 
where 
M 
Ey = x AnYn (10) 


Obviously V S 1 since Z,, S R,(0). On the other hand, since J,, is the 
average of a sum of squares, J,, 2 0, and thus V = 0. We see then that 


O< Ex <1. 


The closer Ey, is to one, the smaller is the rms value of «, that is, the 
better the separation of the message from the noise. The value of Ear 
increases with Af. Ordinarily, inereasing M beyond a certain point will 
increase Eyg only very slightly. There will be a value FE = lim Evy. 


Me 


beyond which it is impossible to increase Eyy. If E is small compared to 
one, it means that even the best linear filter can effectuate only a poor 
separation of the noise from the message; if # is close to one, a consid- 
erable separation can be attained. The rms of «, for a given filter can be 
compared with VR,(0)V1 — E. If this rms value is almost as small as 
VR,(0)V1 — #, then the filter is close to optimum. 


In finding E’y¢ we use the A,, as found from Eqs. (7). It is convenient 
now to set 








Rom) _ 
Ra0)  ™ 
We can write Eqs. (7) as 
M 
> Astt« = Vk k= 0, 1, ar M. (11) 


n=O 


In practice it is usually impossible to build a filter with the A, exactly 
as required. Moreover, over-all considerations may make it undesirable 
to do so. In case the filter selected has its characteristic response given 
by the sequence B, rather than by the desired A, the value of I will be 
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affected. Let the difference B, — A, be denoted by 4,. In this case we 
have 


1 N M 2 
17 yee aN Find x (a - 5) 





= R,(0) — 2 & B,Rva(n) + 2. BrBmRy(n — m). 


Using B, = A, + 4, and dividing by R.(0) we find, denoting Z/[R.(0)] 
by V, that 


V=1-2E Avra + = Relies 5 28 2 


am=0 n=C 


+2 E AndwTn—m + E bnbm7n—m:- 


nm =0 nym =0 
Using Eqs. (11), we find 
M 
V=1- z AnYn + E bndmTn—m- 


nym =0 
Using Eqs. (10), this becomes 
M 
V=1-Ey+ > bn 3nT'n—m- (12) 


Thus the effect of using B, instead of A, is to increase V by 


M 
Ime J bibata—w (13) 


nm =0 
It is not difficult to show that J = O. In fact 


1 1 
is is (i —— tae Re es 
i Re) BON + 1,2 y etree 


Thus, heuristically we have from Eqs. (13) 


1 1 & 
= —— lim ——— bn b, 
J R.(0) tim VG 1.= 3 babtnate 


-Nam= 
1 lim 1 N b 2 
= Ra(0) v0 2N +1 wey (i vat) , 


Since the sum of squared terms is positive, we have J = 0. 
We have from Eqs. (12) 


I=R,(0)(1 — Ex + J). 
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Since J = 0 we see that the minimum value of I, Im is assumed when 
J = 0. When 6, = 0 we have J = 0. Thus choosing the A, as the solu- 
tion of Eqs. (11) does cause J to assume its minimum value. 


3 Determination of the Weighting Function 


We now turn our attention to a method for solving Eqs. (11) for the 
Ag. We recall that Eqs. (11) state 


M 
ZX Anti-n = Yr, K=0,1,--+,M. (11) 


n=O 


From the definition of rz = [R,(%)]/[R.(0)] we see that, since the Ry(k) 
sequence is even, so is the ry. This simplifies the process of solving Eqs. 
(11) for the A, as we shall now see. In case M = 3, for example, we have 


Aoro + Aira + Aare + Asts = Yo 
Aory + Aito + Aor + Agre = 11 
Agora + Airy + Azto + Ast: = v2 
Aors + Aire + Aary + Astro = ¥3- 


Adding the first equation to the last, the second equation to the next to 
last, and in the general case proceeding further in this way, we have 


(Ao + A3) (ro + 13) + (Ar + Az) (1 + 72) = vo + ¥3 
(Ao + Az) (71 + 2) + (Ar + Ae) (70 + 171) = 11 + 72: 


This is @ pair of equations for the two unknowns Ao + Az and Ay + Ao. 
Again subtracting the last equation from the first and the next to the 
last from the second, we get 


(Ag — A3)(ro — 73) + (Ar — A2) (71 — 72) = Yo — 2 
(Ap — Az) (71 — 72) + (A1 — Az) (70 — 71) = 11 — Y2- 


Here again we have two equations for the two unknowns Ag — A3 and 
Ay — Ao. 

Solving each of these two systems of equations, we get Ag + As, 
Ay + Ag, Ag — Az, and Ay = Ap. Adding the first and the third of 
these quantities and dividing by two we get Ao, subtracting we get Ag. 
Proceeding similarly with the second and fourth we find A; and Ag. 

The amount of work in finding the A, is considerably reduced by this 
device. Thus when M = 7 there are eight unknowns, and solving Eqs. 
(11) is a formidable computation. By the procedure just discussed, this 
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case is reduced to solving two systems of equations each in four un- 
knowns. The latter problem is very tractable. 

In case Jf is even, the device still works in a slightly modified form. 
Here we add the middle equation to itself and subtract it from itself, 
We then obtain two systems of equations, one with [(Jf/2) + i] and the 
other with (47/2) unknowns. 

We now turn to a procedure for getting the A, by an iteration process. 
The method which we are about to develop will give some insight on the 
number of weights A, needed for a good filter. 

We have found that a measure of the effectiveness of the filter output 


M 
LX Arba_x (14) 
k=O 
in representing the message a,, was given by 
M 
Ey = X Anve. 
k=0 


The closer Zyz is to one the more effectively (14) represents dn. 

Jt is an important practical question to decide how large to make M. 
Unless Eyxz increases appreciably when M is increased, it is not worth 
while to increase Mf. In practice this makes desirable a procedure which 
gives us Ey, Ee, E3, etc., without undue computational difficulty. To 
distinguish between the various values A, assumes as M changes, we 
introduce the more specific notation, A,“”, 

Thus Eqs. (11) and (10) become 


x 
x a ee =, k= 0, 1, oo M, (15) 
yess 
and 
M 
Ey = Py Ay@ yy, (16) 


We shall now set up an iterative process by means of which we can 
proceed easily from A,“ to Ant. We introduce first an auxiliary 
sequence C;,“ which we specify as follows: 

71 


Cp) Sy oo 
To 


M-1 M 
ia (ro = 2 Cura) =tue— % Cran, 
k=O k=1 


(17) 


CC, = Cy AY — C(O yp pH, k= 1,2,-+5M. (18) 
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Thus knowing C,“!-, k = 0,1,---, M — 1, we are able from ee (17) 
to find Co and then from Eq. (18) to find C,99,k = 1, 
Having determined the C,™, we find Amy OD fa the C, 0 
and A,“ by use of: 


AO ea 28 
To 


Mu 
Ages (ro —% E C1 raes1-4)= yMgi — LD ARP riggs. 
E=0 


k= = 
Also 


(19) 


ACAD = AP) — COM Ay OOD, &=0,1,--+,M. (20) 


We also have 


u (21) 
Evy1 = Emu + Aug (t (races - = oxy) . 


Equations (21) are an immediate consequence of using Eq. (20) in the 
formula (16) for £1¢41. Thus on ascertaining C,@” and Aagyi1%t 
we can find at once how much larger Hy¢+41 will be compared with Ey,. 
This can be done even before using Eq. (20) to compute A,@f*?, 
ks M. 

We now proceed with the proofs of Eqs. (19) and (20). We must show 
that with the A,“ determined from these equations, Eq. (15) is 
satisfied. First let us see what Eq. (20) gives when used in Eq. (15) with 
M replaced by M + 1. We have 


iM 
EH (An8 = Cn Ares On + Ane eas = My 
n=0 
h=0,1,--5,M+1 (22) 
Using Eq. (15), we obtain from the above with k < M +1, 


M 
>» Ch ren = TéMyik, 9 = 0,1,--+, M. (23) 


n=0 
Fork = M + 1 we get Eq. (19) from Eq. (22). Thus Eqs. (19) and (20) 
hinge on Eq. (23), which we proceed to prove by induction. 
We now use Eq. (18) in Eq. (23) and obtain 
M 
X Cas — Co Cie n\n + CoM ry = rez 


new} 
k=0,1,-°:,M. 


WIENER RMS ERROR CRITERION IN DESIGN 139 
This can be written as 
M-1 rey 2 
XL Caran = Tu) + Co (= Cian Ve = rs) 
ned = 
k= 1,2,---,M, (24) 


and Eq. (17). But Eq. (23) with Mf replaced by Mf — 1 reduces Eq. (24) 
to 


= Cun“ rn, = ty, & = 1,2,-°+, M. (25) 
Replacing n by M — m, Eq: (25) becomes 
= Cary pm = thy b= 1,200 M. (26) 
If we set M — k = j in Eq. (26) we get 
of Cee ey, Fa ae — (27) 


But Eq. (27) is the same as Eq. (23) with the index Af replaced by 
M — 1. Thus Eq. (23) is true for the index MM if it is true for M — 1. 
By induction, therefore, the validity of Eq. (23) is reduced to the case 
M =0, Coro = 73. This last equation is satisfied, being in fact the 
first equation of Eqs. (17). 


4 Realization of Operator—Mathematical Formulation 


It is convenient here again to regard the signal as a function of time, 
J(t). The information which is to be extracted from the signal we again 
denote by F(t), F(t) being as close as possible to g(t). 

We shall consider the nature of a four-terminal linear passive network 
which, when its input voltage is f(#), has F(¢) as its open-circuit ouput 
voltage. The complication inherent in the construction of satisfactory 
inductive elements makes the use of RC networks common. For this 
reason and for the sake of simplicity we shall here restrict ourselves to 
RC networks. 

We denote by A(t) the open-circuit output voltage of the network 
corresponding to an input voltage which is a unit-step function. We 
recall the relationship 


FO = f° Ale — 1) dr + AOE. 


We also shall require the Laplace transform of A(t). Here we shall 
denote by k(p) the Laplace transform of A(#) multiplied by p. The fune- 
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tion k(p) is a transfer function. It is given by 
kp) = A) + f A'@erat. (28) 


In the case of a network free of inductances k(p) is a rational function 
having its poles on the negative real axis of the complex p-plane. More- 
over the poles of k(p) are simple, and k(p) is representable by a partial 
fraction expansion 





k(p) = eo + z (29) 


=1 2) ra om. 
where 0 < 0, < og < +++ < ox and the a, are real. From Eq. (28) we 
see that Eq. (29) is satisfied if a = A(0) and if 


EK 
Al(t) — X ame". (30) 
m= 
We recall that in Eqs. (2) we used hA’ (nh) = Ay. Thus we want A, tobe 
equal to 


K 
h XS ome, n> 0. (31) 


The Ap, we recall, are already determined as in Sec. 3. Here we are trying 
to find k(p). In general, to meet the requirement that A, be given by 
Eqs. (81) exactly would require a filter of unwarranted complexity. 
Therefore, setting 


K 
h 2 Om gre, n> 0, (82) 


we require that A, — B, be small. The extent to which we can make B, 
close to A, depends on how large we make K. If the successive A, form a 
slowly changing sequence, K can be chosen much smaller than M. We 
recall that M is the number of An, n > 0. On the other hand, if the A, 
change markedly from one value of » to the next it may be necessary to 
take K as large as Af. The smaller K can be made, the simpler the filter. 
A plot of the values of A, against 1 should be of great help in deciding 
how much the A, fluctuate between successive values of n and therefore 
how large to take K. 

Our problem is really one in approximation. We want to make Eqs. 
(31) represent A, as closely as possible. We have at our disposal the 
choice of the om and the am. It is inadvisable to choose om41/¢m close to 
one since this introduces extremes in the sizes of the elements of the filter 
associated with k(p). It is also inadvisable to take ox/c too large since 
this has no influence on the B, except for small n. 


WIENER RMS ERROR CRITERION IN DESIGN 141 


In order to be able to carry our analysis further we shall now make the 
assumption 


tn = — (33) 


where @ is a scale factor. A reasonable choice for 8 is 1. Under certain 
conditions a value of $ or 2 may provide a better fit of B, to An for a 
given value of K. The choice of om as given in Eq. (83) ts arbiirary and a 
quile different choice may be much more useful under certain conditions. 
From here on, however, we shall proceed on the basis of the assumption 
in Eq. (33). 

We have now to determine the a, so that the B, — A, are small. The 
effect of the terms B, — A, on V as given in Eqs. (13) appears to be 


M 
x ‘ (Ban — An) (Bm — Am)fn—m- (34) 


nn 


This, however, is not complete in the present case because the B, as 
given by Eqs. (32) are not zero forn > M.To prevent these B,,n > M, 
from affecting the filter output, F(t), too badly we require 


x BL 
M+1 


to be small. Recalling Eqs. (32), we can put this condition in more 
convenient form by requiring that 


h is | LAG) de (35) 


be smal! where A’(r) is given by Eqs. (30). Combining this with Eq. (34), 
we choose the a, so as to minimize 


M ° 
J= EX (By~ An)(Bm— An)rnm tM f(A) dr. (86) 
ma=l Mh 


The value \ in Eq. (36) represents a positive number that is chosen large 
if it is important to make the influence of f(t — r) on F(é) small for 
r+ > Mh. 

To see this, we recall that by Eq. (1) 


Afh 
FO = fo AG YOH dr + AMO +H 


where 


H = es Alf — 2) dr. 
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Choosing large emphasizes Expression (35) in J, and thus, when the 
a, are determined by minimizing J, Term (35) will be small. Making 
Expression (35) small will make H small, and therefore F(¢) is deter- 
mined mainly by f(t — r), 0 <¢ < Mh. Thus, as soon ast > Mh, F(t) 
is largely independent of any aberrations in f(t) that occurred when 
1<0. 

In cases where this transient aspect is of little importance, \ is given 
a smaller value determined by the size of r, when 7 is near J in size. 

Once a value of \ is decided upon we again have only the a, to deter- 
mine. In terms of am we have, using Eqs. (30) in Eqs. (36), 


M K page ertee) Mh 
J = = (Ba- An) (Ba Agta + Ah =. 
mn=1 pagal Tp + og 


; (387) 


to minimize J we set dJ/da, = 0. Finding dJ/da,, we have 


fr) Af 0B K orton) MA 
8S Godin Ao SS. 
Oat, man=1 Oa, p=i Tp + os 


Or, setting dJ/da, = 0 and finding 0B,,/d«, from Eqs. (32), we have 


M <4 te eg (orton) Mh 
BE (Ba-An)Pn—me +h pa ___ 


=0, s=1,2,---,K. 
mnt p=l op tas : as : 


And again using Eqs. (32), we obtain 


<4 M 
= ap (1 >> g Mhop—mhow 


e7 orton) M -) 
p= amel op tc, 


=h DY Antame™", s=1,-°+,K. (88) 


Here we have K equations to determine the K numbers a,. We can write 


Eqs. (38) as 
xK 
xX Cpa d, s=1,°-+,K, (39) 
p=l 


where using Eq. (33), 


M 
d, = x 1 Bete oh OO, s=1,-> is) Kk, 
mn 
and 
C +i e bets) ‘ ¥ Mantecibt 
= —— + e" | er 
_ (p + s)8 nm=l pies a 


We have in Eqs. (39) a system of equations for determining the an. 
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Once the a,, are determined we have the problem of choosing a network 
for which A’(r) is given by Eqs. (30), or what is equivalent, k(p), by 
Eqs. (29). 


5 RC Filter 


Here we begin by summing up the characteristics of a four-terminal 
linear passive network with only resistors and condensers for its ele- 
ments.f 

We denote o + jw by p. Let Z1:(p) be the driving-point impedance 
at the input terminals with the output an open circuit. Similarly Z2o(p) 
is the open-circuit driving-point output-terminal impedance. Finally 
Z12(p) is the transfer impedance between one end with the other an 
open circuit. Using the subscripts J and 0 to denote the input and output 
terminals respectively, we have the well-known relationship, 


Vi = Zylr + Zy210; 


(40) 
Vo = Zyolr + Zoelo. 


If Vz is a complex number denoting the input voltage at some fre- 
quency w/2m and Vo is the output voltage on open circuit, then it follows 
from Eq. (40) that 


Z12(je) 
V = V. = Pae 
a TF Baas 


Z12(P) | 
Zi (p) 
In the notation of the previous section Vo(p) is the Laplace transform 


of F(t) and Vz(p) of f(t). We have as the Laplace transform of Eq. (1), 
Vo(p) = k(p) - Vr(p). Comparing this with Eq. (41), we get 


In terms of p we have 


Vo(p) = Vi(p) (41) 


Z12(p) 
Z11(P) ke). (2) 
We have already determined k(p). In this section we shall determine 
Z11(p) and Z12(p) so that Eq. (42) is satisfied. 
It is necessary and sufficient for Z11, Zoo, and Z12 to satisfy the follow- 
ing criteria in order to characterize a four-terminal linear passive network 
free of inductances: 


1. Z1:(p) [and Ze2(p)] have simple zeros and poles which lie on the 
negative real axis in the p-plane. Zeros and poles separate each 


t E. A. Guillemin, RC-Coupling Networks, RL Report 43, Oct. 11, 1944. 
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other. Moreover the smallest pole lies to the right of the smallest 
zero. For positive p, Z:;(p) and Zo2(p) are positive. 

2. Z2(p) has simple poles and is real for real p. 

3. A pole of Z32(p) must also be a pole of Z;;(p) and Z22(p). In fact, 
if p = — o’ isa pole of Zy2 with residue a2 and if aj; and agg are 
the residues of Z,; and Zz at p = —o’, then it is necessary for 


11022 — aya” = 0. (43) 


We shall choose the poles of Z;2 coincident with those of Z,;. There- 
fore, we see from Eq. (42) that the zeros of Z12(p) are the zeros of k(p) 
and the zeros of Z;;(p) are the poles of k(p). The zeros of Z,; are there- 
fore —cz where 0 < 01 < og <+++ < ox. As the poles of Z;, and Zy2 
we choose —o;’, where 


0 < oy! <0, < 09! Son < +++ Sor’ < ox. (44) 
We can take 
oy! = 401, 2’ =3(o1 +2), 99' = $(o2 + 03), ete. 


Thus we have 


Satie a (p + 01) (p + o2)+++ (p + ox) 


a SIE INE 2 SR SEE 
(p + 01')(p + 02’) +++ (p + ox’) 
where ¢ is a positive constant that can be chosen to help make the size 


of the elements of the network physically reasonable. Using Eq. (42), 
we see that, having determined Z,;(p), Z12(p) is given by 


Z12(p) = k(p)Z11(@). (46) 


It is convenient to choose Z2(p) = Z11:(p), thus making the four- 
terminal network symmetric. In general, such a choice may cause Eq. 
(43) to be violated. This can be avoided by reducing k(p) by some con- 
stant factor, g, and then compensating for this reduction by any one of 
several amplifying devices. Equation (43) then becomes 


ay? — gare” = 0. (47) 


Clearly g can be chosen so that this condition is satisfied at all the poles, 
—o;', of Z 12. 
Having chosen 


Zi1(p), Zi2(p), and Z22(p) (= Z11(p)], 


all to within an arbitrary coefficient e, we can now follow Guilleminj in 
finding a four-terminal network with the desired characteristics. Briefly, 


1 Op. cit. 


(45) 
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in the symmetric network shown in Fig. 1, it is necessary for 
Ze = 211 — Zr. 
2, = 2 + 212 


in order for the network to have Z1;, Z12, and Zz as its open-circuit 
impedances. In this manner the problem is 


(48) 


reduced to finding Z, and Z;. Ze 
The quantities Z, and Zs are two-terminal 
impedances. To see this we observe first that 2, 
(44) and (45) assure that 2, 
Ailm > 0, (49) Zz. 
in the partial fraction expansion Fic. 1. 
Z(p) = e+ = an (50) 


=P +om 


From Eq. (46) we observe that, since the poles of k(p) are canceled 
by the zeros of Z1;(p), we have 
O12, 


Zualp) = eb) + Ee. 


Using the modified gk(p) in place of k(p), this becomes 


Zra(p) = egk(w) + $222". (51) 
mat D+ om 


We see from Eq. (28), incidentally, that k(o) = A(Q). 
Using Eqs. (50) and (51), we have 


Kx na 
Ze= 2 — 22 =e — gk(o) + Y Ss Es. 
mel P+ om 


By (47) and (49) we see that aj1.— gai2m 20. Moreover, by 
further adjusting g if necessary, e — egk(~) 2 0. Thus Z, is of the form 


Om 
, 52 
oot E (2) 





where 
a0, wa <6, 


which is precisely the necessary and sufficient condition for Z, to be a 
two-terminal impedence containing resistances and condensers only. 
The same argument applies to Z,. 

A two-terminal impedance as given by Eq. (52) will now be con- 
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structed. Since a resistor R in parallel with a condensor C has an im- 
pedance 
1/C 
SS ee a 
p+ (1/RC) 


we see that by choosing C = 1/am and R = Gm/om’ we obtain an im- 
pedance am/(p + om’). Arranging K of these in series and adding a 
resistor @j, we obtain a two-terminal impedance of the desired form. 
With the choice of a large value of e in Eq. (45) an is large, and therefore 
1/a,,, the capacitance, can be made reasonably smal. 


6 Prediction and Lag with and without Noise 


In Sec. 2 the problem of separating a message, represented by a 
sequence @,, from a signal, represented by a sequence bz, was considered. 
The sequence b, — a, is called noise. There the optimum set of numbers 
A, was determined in order that a, should be represented as closely as 
possible by 


M 
n=O 

In Eq. (53) we utilize b, and earlier values such as bz_1, by—2, etc., in 
deriving a,. There are situations where on the basis of knowing bx, bx_1, 
by-2, etc., we must use Eq. (53) to represent not a; but az4., where s isa 
positive integer. Here we have a problem involving not only filtering, 
that is, the separation of message from noise, but also prediction. In 
other words, even if there were no noise, there would still be the problem 
of determining a,,, from a knowledge of ax, ax_1, etc. This problem arises 
in fire control where it is necessary to point a gun not at where the target 
is but at where it is likely to be by the time the shell arrives. 

Proceeding as in Sec. 2, we now choose the A, so as to minimize the 
rms of 


M 
€ = Gkis — pa Anby_n- (54) 
n=0 
Instead of Eq. (5) we find 


M M 
IT=R,(0) -—2 D AnpRsa(n+s) + LD AnAnks(m — 2). 
n=0 0 


nm = 
Minimizing J, we obtain 


M 


ZX Astin = Vk+sy k=0,1,---+,M, (55) 


n=0 


in place of Eq. (10), where r;, and +, are defined as in Sec. 2. 
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In determining the effectiveness of Eq. (53) in representing a,4,, we 
get now, instead of Eqs. (9) and (10), 


V=1-E£y (56) 
M 
Ex = ZX Anvate (87) 


The method given at the beginning of Sec. 3 for solving two systems 
each of about half the order of Eq. (11) in place of Eq. (11) applies 
equally well to Eq. (55). The iteration formulas given in Sec. 3 can also 
be generalized to cover the case of predicting together with filtering, 
and we now turn to this problem. 

In place of Eqs. (55) and (57) we have, in more explicit notation, 


M 
LX An(M)rin = Yers, &=0,1,-°-+,M, (58) 
n=0 
M 
Ex = pa Be yess. (59) 
n=0 


We observe that the only difference between these equations and Eqs. 
(14) and (15) is in the index of - which is now increased by s. Thus the 
only change in Eqs. (16) to (20) is an increase in the index of y by the 
number s. Equations (17) and (18) remain unchanged since they do not 
contain y, and we rewrite them as before 


M-1 M 
Cy (r. os z= Cres) = Tea — z Cri“%-) r%, (60) 
C4) = C,.,4% — Co Cy, k=1,2,---,M. (61) 


Equation (19) is modified to 


M 
Amy) (ro -> Cx racs1-0) 


n=0 
= YM4148 — PT Ae rysyss, (62) 
whereas Eq. (20) remains unchanged as 
AyD = AYO — C,) Aye OY, b= 0,1,-+-,M. (68) 
In place of Eq. (21) we have 
Exas = Eu + Amy Ot? (racsise - = Cc. rate) (64) 


It is extremely useful to observe that, by Eqs. (60) and (61), the 
Ci“ are independent of the choice of s. It is also helpful to observe that 
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the same combination appears in the bracket in the left-hand side of 
Eqs. (60) and (62) except for the index Af in one case and M + Lin the 
other. The A,“ depend of course on s. In some cases the range of 
prediction is taken as far ahead as is possible without causing Ey to fall 
below some preassigned value. Under such conditions the A,“ and 
Ey must be recomputed for several choices of s. The fact that 0,4 is 
independent of s greatly facilitates the computation. 

So far, for the sake of being definite, we have considered the case 
s > 0 and discussed the prediction problem. The case s < 0 also is of 
considerable importance. By taking s < 0 it is possible to improve the 
separation of message from noise. If such improvement turns out to be 
appreciable and if the lag in obtaining the sequence az4., s < 0, instead 
of @;, is not important, then of course it is worth while to take s < 0. 
All the formulas given here are valid for any integer value of s, whether 
positive or negative. 
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A HEURISTIC EXPOSITION OF WIENER’S MATHEMATI- 
CAL THEORY OF PREDICTION AND FILTERING* 


By Norman Levinson 


Consider the function of time f(é) which is the sum of a function g(t) 
and a disturbance, f(t) — g(t). How can we best extract g(t) from f(t)? 
This is the problem of filtering. 

More generally, how can we best determine g(¢ + h) from f(t — 7), 
05 7< o?I1fh > 0 we have here a problem in prediction as well as in 
filtering. 

In case f(t) = g(t) then there is no problem of filtering, but there may 
still be a prediction problem, that is, finding of f(¢ + h) from f(t — 1), 
0s7r< of 

An explicit solution to this problem was given by N. Wiener in 1942 in 
a document not publicly available.{ Here we shall present an expository 
account of Wiener’s linear theory, making several minor departures from 
Wiener’s procedure. Moreover we shall deal mainly with the analytic 
rather than the statistical aspects. 

The theory developed here will apply if the function, f(é), possesses 
an auto-correlation function, 


et) = im a i “SEF MFC) dr; 


if g(t) possesses an auto-correlation function y(é); and if the cross- 


Tr 
correlation function, x(f) = lim a rs 7ot + r)f(r) dr exists. The 
To = 
importance of the auto-correlation function will be seen from Sec. 1, 
which follows. In case f(t) = g(t) then x(t) = y(é) = ¢(t). We shall 
further assume that ¢(é) and x(t) are continuous and that each has a 


* Reprinted from Journal of Mathematics and Physics, Vol. XXVI, No. 2, July, 
1947, pp. 110-119. 

t For further background on this problem see the introduction in an earlier paper 
of the author, The Wiener RMS Error Criterion in Filter Design and Prediction, 
Journal of Mathematics and Physics, Vol. XXV, pp. 261-278, 1946. 

¢ The book to which this article is appended appears here in its first publicly 
available form, Ep. 
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Fourier transform. These requirements eliminate from the scope of the 
theory such functions as f(r) =1, ~~» <r? or f(r) =sinar, 
—~o <7 t. Each of these functions has a y(t) which does not tend 
to zero as |¢|—» © and therefore has no Fourier transform. In fact, 
these requirements exclude all elementary functions, However, all the 
elementary functions are perfectly predictable, and therefore their exclu- 
sion involves no real loss. 

It is necessary to subtract the perfectly predictable component from a 
function, f(é), before applying the theory presented here. Thus, if the 
average of f(¢) is not zero, it should be subtracted from f(t). 


1 The Auto-correlation Function 


In the linear theory of prediction and filtering we attempt to express 
g(é-+ h) in terms of a linear combination of values of f(t — +) where 
7 = 0. One way of doing this would be to select several values of +, 
tn = 0, and try to choose coefficients, an, so that 


N 
a anf (t — tn) (1.0) 


gives an optimum prediction of g(t + h). This procedure has much to 
recommend it in practice and is easy to carry through. A more general 
procedure is to attempt to predict the value of f(t + h) by means of 


i ” Ht A RKW. (1.1) 


This latter expression involves only the past of f(£) since r > Oin (1.1). 
Another example of an operation on the part of f(t) is 


N 
a nf? (t ~ tn), Tr 2 0, (1.2) 
0 


where f™ denotes the nth derivative of f. 
It will be convenient for purposes of exposition to use the form 


f " f= HRW é& (1.3) 


which appears to be more restrictive than (1.1) and not to include (1.2). 
Actually the method of treatment used will be such that the result will 
come out as an operator on f(t — 7). This operator may be of the form of 
(1.3), but it also can be more general in nature and can include the cases 
(1.1) and (1.2). Thus our assumed form (1.3) is no real restriction. 
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The question we ask is: How shall we choose the operator K(r) so 
that, for a prescribed h = 0, 


Bes a "fi ~ DRG) ae (1.4) 


is ag small as possible? Before answering this question we must decide 
what we mean by the phrase “as small as possible.’”’ Here we shall mean 
that the average (with respect to £) of the square of (1.4) should be a 
minimum. To be precise, let 


I{K] = lim a J : [oe +m) = a " 74-DEG) ar] dt. (1.5) 


Our question now becomes: What choice of K(r) willmake JZ a minimum? 
If we expand the right member of (1.5) we get, inverting limits freely, 


‘ I Ye 
11K] = lim == fot +h) at 


To 


-2f" K(r) dr ae sae 


+ [" Kevan [” Ke) dre lin 35 =f. jit~se= 2) a, 


(1.6) 
the last term of which arises from the 


Aid oT =f. cs K(r)f(t — 1) ar) dt. (1.7) 


Incidentally, since (1.7) is non-negative, it follows that the last term 
of (1.6) is also non-negative. 

The right-hand member of (1.6) becomes considerably simplified if 
we introduce the auto-correlation functions ¢, x, and y. A consequence 
of the existence of y(é) is that for any a, b, and ¢ 


1 


g(t~c)= jim a oT J- LACE ale +0) de 


A similar result is true Me x and y. This last result can be used to show 
that ¢{é) and y(t) are even functions. We also use it to rewrite (1.6) as 


11K] = (0) — 2 f" KG)xh +2) dr 


+ A i K(m) dr ae K(r2)e(71 — t2) dra. (1.8) 
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Since our problem is to find K(r) so as to minimize J, we see from (1.8) 
that the question of what K to choose so as to get an optimum value for 
g{t +h) does not depend on f(t) and g(t) directly but rather on ¢(t) and 
x(t), the correlation functions. This is a most important point. Finding 
K depends on knowing two statistical functions of f and g rather than 
on knowing f and g themselves. If we find two ensembles of functions 
{f()} and {g{t)}, having the same correlation functions, y, y, and x, 
then we can choose a K(r) that will give us the best prediction of 
g(t +h) in terms of the past of f(t) for every g and f in the respective 
ensembles, 

Since the last term in (1.6) is non-negative it follows that the last 
term in (1.8) to which it is equal must also be non-negative. Thus, for 
any K and any auto-correlation function, 


fo Kod an [7 Keneln — 72) drs > 0. (1.9) 


2 The Integral Equation 


If K(r) actually makes J a minimum this means that replacing K(r) 
by K(r) + ¢Mf(r), where ¢ is a real number and M(r) is a function of r, 
must increase J. That is 

I(K + eM) = 1(K). 
From (1.8) we have 


IK + et] = 11K] — 2 [ x(r-+ AM) de 
+ 26 a. M(r) dr f K(rne(r — 1) dry 


+é p] ” MeN ae f " Stele —<ihdix: 
Or 
I[K + eM] = I[K] — 2eJ; + @Jo, (2.0) 
where 
Lae Z Mtl = f of = sKGavan lar 
and 


Tix 7 ” M(x) dr i ” at: aed. 


Now, if for some M(r), Jy ¥ 0, then, by changing the sign of M(r) if 
necessary, we have J; > 0. Writing (2.0) as 


I[K + eM] = I[K] — 2e(J1 — 3eJ2), (2.1) 


HEURISTIC EXPOSITION OF WIENER’S THEORY 183 


we see that since J; > 0 we can, by making ¢ small enough, make 
J, — 4<J2 > 0. Thus (2.1) gives us 


I[K + eM] < I{K], 


which is impossible. Therefore J; = 0 for any M(r). 
Clearly we will have J, = 0 for any M(r) if 


xt 4-3) — i, gi kia —O, 2 SO (2.2) 


Itisimportant to note that (2.2) need hold only for ¢ = O since Mf (r)= 0, 
7 <0. Conversely, the fact that J; = 0 for any M(r) implies (2.2). 
Thus, if K(r) minimizes I[K], then (2.2) must hold. 

With (2.2) valid, J; = 0 and (2.0) becomes 


I[K + eM] = I[K] + Je. 


As we saw in (1.9), Jo 2 0. This implies that J[K + «M] = I[K]. We see 
then that (2.2) is not only a necessary but is also a sufficient condition 
for I{[K] to be a minimum. The problem has thus been reduced to the 
solution of the integral equation (2.2) for the function K.§ 


3 The Modified Integral Equation 


Since the second term in (2.2) is in the form of a convolution, it is 
natural to conclude that this equation can be solved for K(r) by use of 
the Fourier transform theorem. However, because of the requirement 
that (2.2) holds only for ¢ 2 0, this conclusion is false. To see precisely 
why the Fourier transform does not work, let us try it. 

Multiplying both sides of (2.2) by e™* and integrating for ¢ > 0, we 
have 


f ” elt + h)e™ dt = ) * ut at f " Fe—<)de, (20) 


§ In the integral equation (2.2), K is unknown, and x and ¢ are known. The equa- 
tion might be called a Wiener-Hopf integral equation of the first kind. In the Wiener- 
Hopf equation itself it is necessary to restrict the kernel y to be exponentially small 
in magritude. When transforms are taken this provides a strip in the complex plane 
in which to match up factors. In the present case no such strip is available, and, as 
will be seen, factorization is carried out on the real axis of the complex plane. The 
W-H equation of the first kind also arises in some problems of electromagnetic theory 
as has been shown by Schwinger. (See for example, the Reflection of an Electro- 
magnetic Plane Wave by an Infinite Set of Plates I, by J. F. Carlson and A. E. Heins, 
Quarterly of Applied Mathematics, Vol. 4, p. 313, January, 1947.) However in the 
equation discussed by Carlson and Heins, there is available a strip of regularity for 
matchirg factors. Therefore their method of solution follows that of the original 
W-H equation. 
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But the right member above is equal to 


> rs tut a 
i K(x) dr i eo(t — 1) di. 
Setting { = s + 1, we get for (3.0) 


f ” (t+ Ade at = 3 ” K(ne™* dr A i e*9(s) ds. (3.1) 


Now in the usual case the limits in the last integral would not.involve 7 
but would be fixed. In that case the last. equation could be solved for 


f "Fem bG) fom wil, KGS can ‘Ge detected, Hes, 


however, there is no simplification. Nevertheless, since r = 0, notice 
that the last integral would not involve + if p(t) = 0, t < 0. Of course this 
last requirement is impossible, but the general idea can be exploited as 
we shall now proceed to do. 

We replace ¢ by a function which vanishes for negative ¢#. This is 
achieved as follows. We introduce the functions y(t) and Ye(t) such that 


vit) = 0, t<0, (3.2) 
¥2(t) 7 0, t> 0, (3.3) 
v(t) =f valWalt — 1) dr. (3.4) 


Of course it is necessary to show that this is possible. This we shall do 
later. Using (3.4) in (2.2) and also using (3.3), we get 


x(é+ Ah) = F. ¥a(r) dr Sis Wilt — 1 —8)K(s)ds, t¢>0. (8.5) 
Now, if it is possible to find an a(t) such that 


xi) = fo alt — Wale) de, (8) 
then (3.5) becomes 


fi att h— ad ar= f° val) [” e+ - 9K) ae, 
t>0. 
From this we find 


fh ¥2(7) [a+ h-1r)- - Wili—r — 2K (as ar =0, #20. 
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Clearly this equation, and therefore also (2.2), will hold if 
atth—1)— f wt—1—s)K@)ds=0, t>0, 7<0. 
Or, since t — r > 0, the above is equivalent to 
ati-th) = [" valt~)K(@) ds, £>0. (3.7) 


Therefore, we have only to solve (3.7) for K(r) in order to minimize I. 
The equation (3.7) has the same form as (2.2) except that ¥(f) = 0, 
t < 0, and consequently (3.7) will yield to the Fourier transform method. 

Of course everything depends on our being able to find a ¥1, and Yo 
satisfying (3.2), (8.3) and (3.4). We observe that (3.4) is an integral 
equation of the ordinary convolution type which we want to solve for 
two functions yy and y¥2 subject to auxiliary conditions (3.2) and (3.3). 
Since (3.4) and (3.6) involve ordinary convolutions, they can be simpli- 
fied by use of the Fourier transform. 


4 The Factorization Problem 


We proceed now to find ¥;, 2, and a. Once this is done, solving (3.7) 
for K will be a simple and routine Fourier transform problem. Let 


nn ” (tet dt = &(u). (4.0) 
Then by the Fourier transform theorem 
a f ” B(u)em™ du, (4.1) 
2r J—« 
Similarly if 
fF neat = 1), (4.2) 
then 
1 1 —iut 
nO == [nme du, (4.3) 


and an analogous result holds for ¥2(t). Multiplying (3.4) by e™! and 
integrating, we have 


B(u) = fi vals)dr fey — +) dt 


Setting t — r-= s, we get 
&(u) = ¥,(u)¥o(u). (4.4) 
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The equation (4.2) gives us, if w = u + tv, 


wi) = f° vere a, (4.5) 


and 


v,'(w) = a sys (Demet dt, (4.6) 


In the upper half-plane v > 0, VY,’ (w) is determined as a finite function, 
since for v > 0, the term e~” in (4.6) assures the convergence of the 
integral. Thus ¥,(w) defined by (4.5) is an analytic function of w in the 
upper half-plane v > 0. Also Y,(w) is a bounded function in the upper 
half-plane vy = 0. We observe then that the Fourier transform of a function 
vanishing over (— ©, 0) is analytic and bounded in the upper half-plane. 
Also the Fourier transform of a function vanishing over (0, ©) is 
analytic and bounded in the lower half-plane. 

The converse of this result is also true. For suppose that (w) is 
analytic and bounded. Then the integral 


W1() = = Rf Yy (we dw 


can be shown to be zero for t < 0 simply by closing the path of integra- 
tion in the upper half-plane and using Cauchy’s integral theorem. The 
fact that e” is small for <0 and large » makes this step legitimate. 
Thus we conclude that if a function is analytic and bounded in the upper 
half-plane its Fourier transform vanishes over (—«, 0). Similarly the 
Fourier transform of a function analytic and bounded in the lower half- 
plane vanishes over (0, «). 

Combining this fact with (4.4), (u) = V1 (u)¥2(u), we see that the 
problem of finding ¥1(t) is reduced to the problem of factoring (u) into two 
factors, Y¥y(u) and V2(u}, such that ¥y(u + iv) ts analytic and bounded 
in the upper half-plane v > 0, and Yo(u + w) is analytic and bounded in 
the lower half-plane, v < 0. 

Before attempting to factor ®(u), we observe that é(u) = 0. [This 
result is established in Wiener’s theory of generalized harmonic analy- 
sis, Acta Mathematica, Vol. 55, pp. 117-258, 1930. In fact (wu) is the 
density of the energy of f(é) at frequency u. Thus it must be positive.] 
If Yi (u + wv) = Pu, v) + 7Q(u, v) is analytic for » > 0, then it follows 
at once from the Cauchy-Riemann equations that Wo(u + iv) = 
P(u, —v) — iQ(u, —v) is analytic for » < 0. Moreover using the bar to 
denote the conjugate complex number, we see that ¥,(u) = V2{u), so 
that ¥1(u)¥2(u) > 0. In other words, by choosing W2(u, v) as P(u, —v) 


HEURISTIC EXPOSITION OF WIENER’S THEORY 187 


—iQ(u, —v), we satisfy the requirement 6(u) > 0. Moreover, since 
V1 (uy¥o(u) = | Wi(u) |? = (u) 
we see that 
| ¥i(u) | = VO). (4.7) 
Thus the problem of finding ¥1(t) has now been reduced to the problem of 


finding Vi (u + tw) analytic and bounded in the upper half-plane v = 0, 
knowing the value of | ¥(u) |. 


§ The Functions y,, 2, and a 
We introduce the function 


A(w) = log ¥y(w). 


If we write \(w) in terms of its real and imaginary parts, then \(u + 7) 
= p(u,v) + ig(u, v). The requirements on VY; are certainly fulfilled if 
d(w) is analytic for » > 0, if ce?” is bounded for v > 0, and if 


p(u, 0) = 4 log &(u). (5.0) 


This last requirement is (4.7), The condition that \(w) be analytic for 
v > 0 is equivalent to the condition that p(w, v) be a harmonic function 
for» > 0. In this way all our requirements can be specified in terms of 
p(u, »). 

The determination of the harmonic function, p(u, v), taking on speci- 
fied values on the real axis, as is indicated in (5.0), is well known. In fact 


1 _ gv log B(s) 
ates -[" aor ae (6.1) 


will be harmonic/] and will satisfy (5.0). The integral (5.1) is the well 

known Poisson integral, and e?“* is a bounded function for v > 0. 

Thus all requirements on p(u, v) are fulfilled. We shall also have occasion 

to use the fact that, with p(u, v) determined by (5.1), ¢?“™” is very 

limited in magnitude for »v > 0. (It is certainly O(etlely for any « > 0.) 
If R denotes “real part of,” then 


v ¢ 
(u—s)? +07 Re i} 
Thus (5.1) can be written as 


p(u,v) = R{ 2 {. Zope) as} . 


|] If the integral (5.1) diverges, then Wiener has shown f(é) can be predicted per- 
fectly from its past. 
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Taking account of the fact that @(s) is an even function, we have 


a t ° w log &(s) é 
p(t, ») = rtf ee as} (6.2) 
Since p(u, v) = R{r(w)}, it follows from (5.2) that 


° w log &(s) er 


0 wt — 3? me} 


A(w) = ~ 


We recall that 
Vi (w) = &™, 


Thus we have completely determined %,, and with it, from (4.3), also 
¥1(¢). Not only is ¥,(w) analytic and bounded in the upper half-plane, 
but 1/%1(w) = ¢™) is also analytic in the upper half-plane. This last 
relationship, which appears to be incidental here, is in fact a basic 
requirement as we shall see. 

We turn next to (38.6) 


0 
x) = fo valsalt — 2) dr. (5.4) 
Introducing the Fourier transforms, 
A(u) -f- a(tje™ dt, and X(u) = J. x(e™ dt, 


we get from (5.4) K(u) = A(u}¥e(u). Thus A(u) = X(u)/¥eo(u) is 

determined. We find a(t) from a(t) = ae L A(u)e™ du. Thus 
2rJ-«@ 

¥1(£), ¥e(t), and a(t) are determined. 


6 The Prediction Operator 
We return now to Eq. (3.7), 


atth) =f wG- KG) ds, £20. 
Multiplying the equation by e*” and integrating for ¢ = 0, we find 
fF ate meter dr= [" Kedar f wt - et at 
Or setting !-—7=s, 


f ” a(t + Adel dt = f ” Keio" dr f ” W(s)e* ds. (6.0) 
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If fory = 0, 
bibahom f ” K(r)e™™ dt, (6.1) 
where w = u + tv, then we have from (6.0) 
° a(t + he 
btw) = Wi er (6.2) 


We have thus found the Fourier transform, of K(r). 

In order that K(r) as determined from its Fourier transform, k(w), 
shall be null for ¢ < 0, it is necessary for k(w) to be analytic and of 
limited growth in the upper half w-plane. Now the integral on the right 
side of (6.2) defines a bounded analytic function forv = 0, As has already 
been indicated by the remark in the paragraph following (5.1), 7% is 
limited in magnitude in the upper half-plane. Thus k(w) as given by 
(6.2) is analytic and limited in magnitude in the upper half-plane 
[O(e') for any e> 0]. We may conclude, therefore, that K(r) as 
determined from k(w) will be null for r < 0. Strictly speaking, k(w) often 
will not have a Fourier transform in the orthodox sense but is rather the 
transform of an operator which deals only with the interval (0, ©). 

For example the best representation of g(t + h) may be given by f’ (£) 


and not by f fi — r)K(r) drat all. In this case if we apply | - <| 
Tlp=d 


to f(t — 7) we get f’ (t) as desired. If we apply [- ¢] to 6”? we get 


drj.=0 

—iw. Thus in this case we would find k(w) = —iw. 

Again if f(t — a) should be used in the representation of g(¢) this is the 
result of taking [f(t — 7)],.,. Doing this to e*”’, we get e*”*. More gen- 
erally then, if we find as a result of using (6.2) that 

z 

oT 

wt? 





k(w) = —w* + Qiwe” ~ ef? + 


then in place of an operation on f(t — 7) of just the form f K(r) dr 


we get 


Sito] +2fn-] -s0-] 


= =} 


+ fo se Net ar = 7" - 2 O-) -F0-P 


ee f ” 9@ —s)e* de. 
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We can check this by using e”? in place of f(é — 7), getting 
— wy? as Qiwelv — _fiv + i, ett dr, 
which is in fact k(w). 
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